The Bots Have Eyes: Why the Evolution of Visual Chatbots is Exciting
The world may be enamored with bots at the moment, but they’ve actually been around for quite some time. The first bots were used in the finance industry more than a decade ago, automatically buying and selling equities based on key market indicators. It was a novel concept at the time, but the technology is now ubiquitous in the industry, with the financial robo-advice market projected to grow to $7 trillion by 2025, according to CNBC.
Today’s bots have evolved to become much more capable than their ancestors. Conversational AI platforms, known as chatbots, automate and scale one-on-one conversations – with massive use cases that extend well beyond the finance industry, into the sales, marketing and customer support domains.
What’s more, they’re continuing to evolve from their predecessors; just a few years ago, the notion that a bot could answer a text message or suggest a product for purchase was revolutionary. This is now commonplace, with chatbots a near-standard Help feature on websites and other online platforms.
The next evolutionary stage in bot technology should have entrepreneurs salivating.
What if chatbots had eyes?
Businesses utilizing chatbot technology today have likely done so for two main reasons: to enhance the customer experience and save money. Juniper Research projects bots will cut business expenses by as much as $8 billion by 2022; without a doubt, this technology can make a huge impact for both SMBs and enterprises.
Yet bots still come with a multitude of problems for entrepreneurs, especially as it pertains to customer experience. Sometimes chatbots fail to deliver user experiences that are as seamless, efficient, and pleasant as hoped. And often the reason why is simple: Chatbots cannot see.
When a customer interacts with a chatbot, the success of the communication is highly dependent on the customer’s ability to accurately describe – and type – the issue at hand. The chatbot’s ability to interpret the customer’s phrases, nuances and complex reality is limited as well. This carries over into the chatbot’s ability to help the customer solve the problem. The bot’s responses are even further limited by a specific pool of words and texts.
According to a PointSource survey, 59% of customers say bots aren’t getting the job done, because customers are more than text. They are emotional, visual creatures who communicate with body language and subtle cues. Humans use their eyes and brains to see and visually sense the world around them. That’s why we’ve seen a huge spike in visual search engines, video tutorials, and more visual customer assistance.
For business owners, the difference between visually walking a customer through resolution steps and typing words about mechanical actions is immense: Visual engagement reduces frustration and empowers the customer rather than escalates dissatisfaction.
Early Stage Visual Bot Have Arrived
Computer vision AI is already being utilized in a wide range of applications. It recognizes faces and smiles in cameras; it helps self-driving cars read traffic signs and avoid pedestrians; it allows factory robots to monitor problems on the production line. In customer engagement, it will help the visual chatbot see the problem, as a virtual assistant. The implications of this for business owners is immense.
The e-commerce industry, and the fashion industry in particular, has been among the early adopters of visual chatbots. Levi’s AI-powered virtual stylist and Amazon Look can advise the shopper about products or styles most suited to them. If brands can use computer vision to “see” and understand their customers on an individual level, then they can truly up their efforts at personalized sales, marketing, and service. These are exciting developments, but there are many more use cases along the customer journey that still remain untapped.
The Path to Evolution
For mass adoption of visual chatbots, vendors and enterprises are required to adopt the core technologies that support its development – computer vision AI and Augmented Reality (AR). This evolution will encompass a number of phases:
Phase one: Text to image
At the early stage, the chatbot receives text-based inputs from the customer, interprets the input and retrieves a relevant visual from a knowledge base or a search engine. This can be a reply for a specific request, such as “please show me the room in the hotel I’ve reserved,” or a general request, such as “how do I program my coffee machine?”
Phase two: Image to image/text
At this more advanced phase, the bots apply computer vision AI to process the input received, and reply either with words or visuals. For example: Museum-goers snap a photo of an item of interest and a museum chatbot recognizes the item and shares more details about the artist and the item’s background.
Phase 3: Image to smart image
At this stage, the bot applies computer vision upon processing the input as well as processing the reply. For example, the customer contacts his insurance company following a car accident. The bot asks him to upload images of the vehicle, identifies the damaged areas, detects the extent of the damage, and estimates the potential cost of repairs – information that speeds up the claim cycle and saves money for the business. Insurtech companies such as CCC have been focused on developing these capabilities, resulting in the maturing of ‘virtual adjuster’ bots.
Phase 4: Interactive visual conversation
The most advanced stage in the evolution is when the chatbot can switch to real-time video mode, enabling the customer to show the issue and receive interactive AR guidance. This advanced bot can perform complicated tasks while guiding customers and can also provide feedback and correct them in an interactive manner. For example, when unboxing a new router, a ‘virtual technician’ recognizes the cables and inputs, and guides the customer using AR through the installation process.
Next Step: Teaching the Visual Bots
Advanced visual bots harness deep learning technologies to recognize and analyze visual images to the highest degree of accuracy. Deep learning requires the creation of a massive data set in order to effectively train the model. In order for the visual bot to correctly identify vehicular damage, as in the above example, the bot must have had the opportunity to process tens of thousands of images of each damage type.
To identify a coffee machine’s specific model, the visual bot needs to have processed a massive amount of images of each specific model – in various lighting, angles and positions. Building these massive data sets is extremely time consuming and labor-intensive, and simply out of scope for many enterprises and vendors. It will be time intensive. It will be costly. But it absolutely will be done, and just as with all technology the price will drop over time and become affordable for all sorts of businesses.
A Smart Investment for Business Owners
Chatbots are quickly becoming an integral part of the user experience, and as long as humans are involved it is clear that the future of chatbots is visual. The transformation to visual bots will be an evolutionary process, where gradually the bots move from traditional text-based understanding to image processing, and eventually to full visual interactions.
For entrepreneurs and business owners, the potential upside is far-reaching; improved customer experience, loyalty and retention, lower costs, and more generated revenue thanks to personalized sales and service.