Deep Learning Gives Sight to A Virtual Technician
“Gareth! Coffee! Now!”
On a cold mid-winter morning in 2030, I’m snarling at Gareth as the robot scans my espresso-maker. His chest screen displays details about the internal adjustments he’s about to make, adjustments that his display told me to make weeks ago.
“You realize this going to delay me by at least five minutes,” I complain.
I get no reply, of course. I programmed Gareth’s voice response feature to activate itself one hour after I wake up.
“Slowpoke robots,” I grumble as I return to the kitchen for my slightly tardy, yet perfect, espresso macchiato.
A sound outside causes Gareth’s head to swivels and look out the window. His display reads: YOUR AUTODRIVE CAB HAS ARRIVED.
“Tell it to wait,” I say with irritation, “I’ll be five minutes late.”
Wow, 2030 Me, a delay of a whopping five minutes. Think about how long it would take for you to fix your 2017 espresso maker, then get back to me with your complaints.
The smart home revolution is creating an interesting paradox: we’re using more intelligent devices and services to simplify our lives, yet their installation, activation, and operation is getting complicated. As a result, our increasingly sophisticated smart home networks are creating billions of new requests for technical support.
In the near future, we may not need to make these requests – with increasing intelligence, our home devices and domestic robots will most likely have the ability to automatically detect, see, and fix problems on their own.
These robotic AI-powered virtual technicians will help us manage, use, and service our devices, saving us precious hours, or even days, spent waiting for call center agents or technician visits to resolve technical issues. The medical field is already using robots to diagnose blood tests and x-rays, so it’s safe to say that we’ll inevitably see simpler domestic bots that can, say, diagnose a broken washing machine or an interrupted Internet connection.
The key to this progress is sight: in the same way that a human technician’s eyesight is the main sense he uses for diagnosis, the main technology driving a virtual technician’s effectiveness is computer vision.
Deep learning & Computer Vision
Computer vision involves processing and analysis of digital images and videos to automatically understand their meaning and context. Computer vision features a wide spectrum of capabilities, including: object detection, facial recognition, motion detection, image restoration, and content synthesis. A wide variety of objects and applications are currently using these technologies , such as self-driving cars, camera systems, and search engines.
Over the past few years, deep learning has driven significant improvements in computer vision accuracy and performance. Deep learning, the most advanced form of AI, enables independent learning of massive data sets. Unlike classic methods in which a human expert needs to define features (rules and attributes), deep learning can learn straight from data without human intervention, with a minor guidance (supervised learning) or without guidance at all (unsupervised learning). In some fields, deep learning achieves far greater results than classic computer learning methods.
Let’s take a look at how we can use some of these technologies to build our virtual technicians of the future:
This technology enables finding and recognizing objects within images or videos. Object recognition includes several tasks, such as: classifying that the image has a specific object, localizing the object in the picture, distinguishing the object from other objects, and identifying parts within the object.
Since 2015, deep-learning-based object recognition has achieved amazing results, with the error rate dropping below 5% (the human level). This means that in several fields today’s machines can recognize objects even better than human beings can!
This incredible accuracy makes object recognition a core technology for the future virtual technician. To diagnose an Internet connectivity problem, for example, a virtual technician needs to identify the router (and recognize the specific model), its parts (such as indication LEDs and back panel), and its cables; it also needs to localize all these objects to understand their context in order to diagnose the problem.
Image 1: Identifying Modem and Printer with Object Recognition
Image to text:
Using ‘deep semantic alignments’ for textual descriptions allows the network to describe what it sees on an image in a simple sentence. A machine develops this capability by recognizing objects and their locations within an image, converting this information into text, and creating a meaningful contextual sentence to describe the image. Using this technology, customers will be able to upload images of their equipment, after which a virtual technician will automatically describe in a sentence what it sees, for instance: “A DLINK 5323 modem with a red light and ADSL cable disconnected”. Such a description may help a human agent in a contact center diagnose the problem more quickly; alternatively, using Natural Language Processing (NLP) to automatically retrieve textual explanations of how to solve a problem.
Image 2: Textual description for a TV screen issue
Visual similarity enables finding similar images to the image that is provided to the machine. This capability is essential, as sometimes it is hard to explain an image in words, and the easiest way to a solution starts with finding similar ones. Visual search engines and websites such as Pinterest use visual similarity to provide their users with similar images (with related objects, colors, or patterns). Visual similarity will allow virtual technicians to take images of a technical issue and search for similar issues within massive visual data sets of captured technical cases.
Image 3: Finding similar issues with a specific router
Motion detection allows tracking moving objects in videos in real time. Motion detection is a fundamental capability in autonomous cars for ‘pedestrian detection’ capabilities; and in the security domain to identify moving persons detected by security cameras.
Motion detection is a key component of the the future virtual technician, enabling it to provide instructions and feedback to customers in real time . For example, in order to instruct a customer on how to install a printer, a virtual technician needs to identify the customer’s hand movements and provide him with real-time instructions accordingly: e.g. “Now put the papers in the tray”, “Now hold the network cable.Not that one, but the other one”.
Image 4: Motion Detection in troubleshooting process
Finally, the most common task in computer vision: recognizing faces within images. Deep learning has made significant improvements to machines’ facial recognition capability , especially in challenging lighting conditions, angles, and backgrounds.
With facial recognition, a virtual technician will be able to perform the critical task of recognizing customers, and greeting them accordingly:“Hello Mrs. Brown, I see that you’re having trouble with your laundry machine”. Companies can use facial recognition to offer biometric warranties, ensuring that customers get service for their devices without forcing them to save receipts and warranty documents.
The progress of computer vision with deep learning will foster the creation of ‘artificial eyes’ for the virtual technician of the future, and help us, the consumers, manage growing number of smart devices that we’re setting up in our smart homes.
In our next article in this series, we will detail how we can actually bring the virtual technician of the future into the present with object recognition.