Challenges to connectivity
Much has been written in recent years on strategies for observation and analysis of servers in the cloud data center. So much so that we have a whole field of DevOps who have refined this topic into an art to keep businesses running optimally. Robot fleets similarly rely upon historical data and metrics for the improvement of their day-to-day operations. The importance of these principles may not be readily apparent in early development phases of a robot, but as a company scales to sending many robots out into the field they become essential to tracking accomplishments of goals, identifying mistakes, and recognizing patterns.
Beyond the storage and processing of this data, roboticists face additional challenges that server infrastructure engineers do not. Robots operate in environments fraught with both physical and financial barriers to the transfer of data. Some of these environmental challenges include:
- Structures or equipment that create poor signal or noise
- Locations where data is only accessible via expensive mobile data services
- Networks that are not accessible due to security
- Limited robot hardware that can't power always-on communication
As a robot business, you must consider these challenges and develop a strategy to overcome them in order to optimize your service. While each use case is unique, certain considerations are common throughout the field. For example:
- What information do I need to see all the time?
- What information do I need to see on a case by case basis?
- What information is important for me to be aware of immediately if something goes wrong?
- How can I financially optimize my data ingestion?
The benefits of on-demand ingestion
In this article, I want to introduce a technique that can be applied in order to address these issues — on-demand ingestion. On-demand ingestion is the process of storing robot telemetry on the device itself so it can be accessed when needed or kept until extraction. Doing this allows us to move beyond the paradigm that a robot is simply an "always-on" source of various kinds of telemetry streams, which can only be either "on" or "off".
The primary advantage of on-demand ingestion is that it allows robot businesses to reduce the costs of mobile data transfer, cloud storage, and processing. When it comes down to it, the cost of additional on-device hard drive space is minuscule compared to the cost of cellular data transfer. By storing one's observational data on-device and accessing it selectively — rather than passively transferring a constant stream of data to the cloud — businesses can often reduce their cellular data expenditures by upwards of 80%.
Ultimately, not all telemetry generated by a robot needs to be sent immediately. Often robot companies have existing support systems with a list of incidents that need to be investigated. We want operators to be able to start with their knowledge base first and determine if the cost of data acquisition is worth the effort. If extraction is worth it, then it should be as easy as possible to gather this data whenever it is available. A typical workflow might look something like:
1. A ticket comes in from a field agent regarding an anomalous event (e.g., faster battery drain, battery high temperature, sensor malfunction or unexpected state changes) with a robot on-site.
2. We look at the incident and determine what kinds of information would prove useful in diagnosing the problem at that moment in time.
3. With our knowledge of what data types are relevant, we can query the robot for the presence of that data at or around the time of the incident, identifying the size and scope of data necessary to resolve the issue.
4. We send a request to a robot to retrieve this data at the next opportune time it's available.
5. We are notified when this data arrives and can complete our investigation
Notice how we've turned continuous historical data transfer into a reactive operation that could be automated to allow a single person to investigate many issues with minimal data cost. Rather than constantly query, transfer, and store telemetry data of unknown worth or necessity, we can now selectively peek into the system on an "as-needed" basis.
On-demand ingestion also means that triggering/alerting mechanisms can be reassigned from the cloud to the machine itself. Having these mechanisms function on the robot, rather than in the cloud, opens up a number of possibilities. Normally, data sent to the cloud level is throttled to save on the cost of data processing. A robot with local storage of data can keep telemetry in the maximum resolution capable by the hardware. This allows on-device observation/alerting of high-density telemetry that would be cost prohibitive in the cloud level.
Similarly, consider the creation of data science models that identify when something is wrong with your robot. The accuracies of your model should not be limited to an arbitrary ingestion rate for live data. With high-resolution data stored on the robot, you could selectively extract high-density data in the time ranges around the moment of error. In this way, one enjoys "the best of both worlds," with throttled live data on-cloud and high-density data on-device.
Robotics' persistent data problem
On-demand ingestion is a compelling solution to a challenge that will likely persist for quite some time. The wireless data market shows few signs of competitive pricing or technological improvements capable of keeping up with ever-increasing usage. The majority of mobile networks are still owned by a select few companies that charge high fees, with the costs of taxes, installation, and maintenance baked into them. Ultimately, these services run on spectrums that have limited capacity and are struggling to keep up with the latest consumer innovations. Until a major economic shift occurs, managing a robotic fleet will involve trade-offs between the financial costs of bandwidth, the scheduling of ingestion, and the volume and variety of data types one monitors continuously.
A data ingestion strategy is essential for any business wanting to take a robot from a well-tested device connected to the lab's wi-fi network to the complexity of your customers' environment where data availability can be a wildcard. Flipping a switch to tell your robot to save data on-device is just the beginning. Fascinating opportunities are opened when we begin asking how we can intelligently release that data to maximize your operations. As a data engineer, I'm excited about how integrations with 3rd party services and statistical models can help businesses automate the discovery of the unknowns hidden within machines.