The following is a transcript of the presentation delivered at RoboBusiness 2019.
All right, guess we'll go ahead and get started. Got a nice big crowd over here, really excited about that. So today we're gonna be talking to you about the automation of data collection for business analytics, specifically, in the robotics context. And just to give you a little bit of information about myself, my name is Abraham Dauhajre. I am the Head of Solutions at Formant and I've been working at the intersection of hardware and software for a very long time. And I've had a lifelong passion for robotics. That's me as a kid at a robotics competition, maybe 20 years ago. So, sort of in my blood.
Root cause analysis
So we're gonna tell you a root-causing story. We can't really share our customer data with you, but we've done something very special here, we've built the simulation of a warehouse and we're gonna tell you a story of how we see our customers and how they can see the right way to actually root-cause a problem, solve it, deploy it and verify it. So we have this fake company called Bots and Boxes where we sell and ship fuses and sprockets, two components absolutely essential for robotics. And every warehouse has three AGVs, very classic type AGV, see a bunch of them here at the conference. Couple of robot arms and a couple of human operators that have introduced some level of entropy into the system.
Now, we have taken this fairly seriously. We have directly instrumented the physics and the game engine to provide us some real instrumentation and some real data that we can then take, analyze and derive results from. What couple of examples of that are modeling things like battery drain based on AGV load. Measuring the weight of the pallet that's on the AGV and how that directly impacts the actually battery. Things that make this feel like a real-world application. And we've gone ahead and deployed this to 10 global warehouse sites. Now, we're running this in the cloud and my CEO's over there and he's probably angry with me they're not running this on 10 EC2 instances that cost quite a bit of money per hour to run but we do this so that we can actually take camera images from all of the game simulations add real data and actually enjoy this and actually be able to show you something that does make sense. So we've also done somewhat around building out analytics dashboards, KPIs, things that we think in the warehouse industry would be very important for someone to track.
So here's our story. We're gonna start with automated alert. It's going to inform a KPI we wanna take a look at. And then when we're looking at the KPI we're actually gonna time travel back to the point in time where that alert was generated and try to investigate the problem.
Interventions, integrations, resolution
After we have investigated this problem we're gonna take whatever action we need to take. Maybe it's an integration change, software update or whatever it may be, deploy that and verify that that's actually fixed the problem. So in our warehouse scenario one of our most important things is the time it take for an AGV to take an order from inventory to shipping. We define that as order duration. In our warehouse, in our simulation, we know that any order that takes any longer than 60 seconds one minute to deliver, is a problem and it's something we should be looking at. So you know we're big users of Slack at Formant and one of the things that we've done is we've hooked up Slack to our game simulation and this alert came in, "order duration exceeded limit." We can see from the tag set that it's from Sacramento. It's running version 1.0.0 which is sort of interesting and it was an order for an 18T sprocket and a 120 amp fuse. Very common order that we see a lot here at Bots and Boxes. Now, okay, I get this in the morning maybe. I open up my analytics dashboard and I'm looking at this and I say, "Well, hold on a second. What's going on here?" These two things are sort of interesting. Why in Sacramento, which is that big purple line at the top, why is it taking so many trips to the charger? And I can see here in my order duration anomalies KPI that I'm hitting a number above 60 quite often. And one interesting thing here too is actually actually went down over the last sort of day and a half. So that one's actually fine but Sacramento has sort of seen an increase in duration anomalies. So at this point in time we're gonna say, okay, let's take a look at this and jump in our time machine. We're gonna click on that data point and go back in time to when that alert was generated. And when we get here we'll start seeing some interesting things.
You know some of the things that we see are the AGV velocity over here, it's running at about 6/10 of a meter per second, that's not right. I think we're supposed to be running closer to 2.4. We can also see there are safety, our limit is at one, which is also incorrect. So we start seeing there's not something quite right here. Maybe we need to talk to our operations team to figure out what's going on. We can clearly see from the video that the guy, he's just absolutely crawling through the warehouse. That's a huge problem. So we talk to our ops team. They get on the application configuration. They say, "Oh yeah, we forgot to update the configuration." We'll go ahead and do that. We'll reset the AGV safety limit and we'll bump the nav version from 1 to 0.1.
Command line interface, pushing updates
We make that change and we deploy that to the fleet. So we use our command line tooling, update it, watch it. You know, watch its update in progress which is what the right side of the screen is. Watching which version of the configuration is applied, which is which one is desired. Making sure that it actually updates, that it actually takes and that it looks like what we expect. And then, you know, we can go back and we can look at this in live mode and then see that, hey we can actually see here that our AGV loss is from a limit of one to a limit of three.
Speed to insight
It's actual velocity which is this blue line over here looks correct. And we can really tell from the video that in real time the AGVs parameters were changed and it actually did not require any AGV restart it didn't require shutting down the warehouse and it didn't require anything crazy, it just starts going. So that's our root-causing story. That's something that we see quite often, actually, when we talk to a lot of companies is that they end up spending a lot of time trying to figure out what the problem is instead of actually fixing the problem. And it's something that we think can be made very simple with cloud native approach to solving these problems.
Introduction to Formant
So, a quick note about Formant. The sales guys would be very angry if I didn't talk to you about the company. We're trying to empower the next generation of automation. We take data from robots, PLCs, cameras, any kind of input source data and make it available to developers, executives, operators, you know. Reading data from the physical world, writing data to the physical world, making that very accessible to everyone. And we kinda think about this in three different ways: First is observability. Tackling the unknown unknowns. Now, what's the difference between a known known and an unknown unknown? Well, a known known is something like I got an IF check in my code, I catch an exception, I can generate an alert from that. Now it's a lot more difficult when your robot turns into a corner and you have no idea why that happened.
Context is key
You need the context of all of the data around them to understand the problem and derive insights for it. The operation aspect of your robotic theme. You know when you start out, you may have one or two robots. One engineer to a robot. But as you start scaling your operation you really wanna flip that ratio. You want one operator per 10 robots.
One operator per 100 robots
One operator per 100 robots. Doing things like interventions, telling the operation. Not needing SSH root access to your robot just to unstick it from a corner, which is something that we see quite often. And then lastly analytics. Something that's very interesting that we're seeing with a lot of our customers is that they don't tell their customers they're a robotics company. They're a construction company or they're a logistics company. You know, they're a dog training company. Robotics is just the tool that they use to solve that problem. Now one thing that we know from my experience at SaaS and other types of industries is that you wanna be data driven. The customers want to see ROI with real data. They don't wanna see intuition or things like that. They want real data driven analytics that just happen to come from a robot. So let's dive into observability a little bit more here. Really important with observability is the ability to go directly from your sensor to visualization and observing. Now a lot of native server cloud logging tools, they're gonna deal with text mirror type data but that's not really robot shaped data. Robot shaped data is images, it's point clouds, it's indoor localization, it's GPS.
Time machine for robot data
It's these types of data that are in some cases very heavy but also require a specific type of visualization to understand. It's also about having a time machine. You know, being able to hop in and go back to when that alert happened in understanding the concept, the context of that situation. I just showed you the warehouse example and it's very easy for us to jump from an alert to a point in time that that happened and understand the context of the whole warehouse in that situation. Leveraging an aspect of dimensionality. One of the things that we believe in is sort of having this unstructured tag set that you define having on your robotic application and then being able to drill down into very specific tags, you know specifically. In this example we're drilling down into a location in South Florida, which is actually where my mom's house is. And also into the Dominican Republic which is where my family's from. But then filtering down to that very specific data is something that's super important and needs to be easy. And you know, not all input sources are robots. So I'm gonna go with a live demo here so hopefully this works. One thing that we see a lot is your robots have cameras, they have tools, they have utilities but you may have an operator that's like, "Hey, I don't know what's going on with the robot. "Something's broken here." You know, "What's going on?" Hopefully this works. You wanna be able to record in real time, kinda what's happening, maybe add some notations to it and having input sources that are not just your robot. Moving onto sort of the operational section. We think about this in a couple of different ways. The first one is doing kind of an intervention-style request. This is an example of a computer vision work flow where maybe your robot is taking pictures of pizzas and you wanna count the number of pepperonis on that pizza. So you're gonna select pepperoni, you're gonna put a circle around it or a box, maybe you add some labels and then off it goes. Now a really important part of this is that it's a human in the loop with your robot. It's not, you know, it's not your robot talking to a back end server that then takes 25 minutes to get to some other system. It's really synchronous. In some cases more synchronous than you want. But it's a human in the loop with your robot, robot talking to a human.
Speak robot is our sort of marketing lingo. Being able to configure and deploy your entire fleet. We've taken a very cloud native approach to this. We treat your robot configurations infrastructure as code, which is a very important concept in applied technology. And being able to deploy in between your configuration software version controller and in similar systems is really key to understanding what state is your robot fleet in. And at the same time this is not something that talks directly to your robot.
There are cases where that is important but maintaining the state in a cloud backend that your robot can then ask for when it comes back online. One of the things that we all know is that network reliability is super difficult in a robotics context. And you know we've taken this sort of into account in the way that we think about a lot of these things.
Being able to beam in and teleop your robot. I don't know how many people here have had that sort of drunk robot experience where the robot is driving in a zig-zag line down the hallway but if your robot gets stuck in a corner, or gets stuck behind a person, or someone knocks it over... Well, hopefully no one knocks it over. But you occasionally need to actually beam in and take direct control of the robot, fix it and go. Now, a lot of the tooling that we see is really sort of engineer-specific tooling.
It's not an operator-specific tooling. A lot of the sort of work loads we see are I've got an S stitch in the robot, I'm gonna open up the ROS tool and maybe send the command bell to the robot directly. Now it doesn't really work for an operator and it doesn't work if you have a call center you know, something like that. And lastly analytics. Remembering again, we want to empower our customers. We want to talk about being data driven. And being a data driven company that maybe happens to use robotics to solve its computing problems. So going directly from the robot sensor to a business insight automatically not building tools that take this data, parse it to that data and go to that data. We're going directly from robot sensor to business insight. And all of these actual KPIs are derived directly from that community warehouse Bots and Boxes simulation I was showing you. So we actually have some really interesting ones here and some anecdotes around this but when I first built it I didn't actually know what I was going to show as the root-causing journey. I kinda let the game run for a couple of days and I started to notice things especially around order duration so I sorta zeroed in on that and modeled it around that. But I wouldn't have been able to do that without something like this. And the last piece is trying to be flexible around different tools that you do integrate.
Moving robot data
So we think of ourselves as a sort of a data pipe in a lot of cases. We're really good at getting data from your robot to the cloud. So whether you're a ROS application, whether you have log files on your filesystem or you just wanna use our API to generate dynamic telemetry screens. All of that is part of our architecture and part of our platform. We also know that a lot of people wanna use the right tool for the right job. We use, for example, Slack and Pagerduty for our own backend monitoring and alerting and we know that people would want to use those as well. Another really interesting one that we've got now that we're really excited about is actually an S3 export utility so going from example a ROS message to a data type, a JSON data type that is parsible by VI.
That's one that we're testing on that with a couple of customers. So why do we build this? Why do we think this is important? Observability is really tough. Building this platform is not the easiest thing. We typically see that company will start, they'll hire a full stack engineer, have to start picking a public cloud, choose AWS, PCP, Azure. You start looking at the products that they offer. Robomaker's great. It does a lot of really cool things. But you still wanna start pushing logs and metrics to the cloud maybe you need to build a GUI to understand what's going on. You start now needing to expose data to different stakeholders. Maybe need to hire a second cloud engineer to support that. Start having scalability issues. Security starts becoming a concern. You may have to do security audits. You might need to not deploy your VPN where a VPN is not allowed. So it's a really complex problem. We can see companies devote 20-30% of their engineering budget to this particular problem.