Can Conversational AI Shed its Dunce Cap?

Conversational AI is everywhere. (E.g. Alexa, Siri, Google Assistant, as well as thousands of lesser known chatbots). However, as discussed in the last post, these systems currently function well only with simple tasks, because the bots cannot understand normal everyday conversation the way a human assistant does. As a result, we’ve largely given up using them for complex tasks.

Current chatbots are not very smart

Current chatbots are not very smart

As we saw at this year’s Google I/O conference, Google demonstrated that we are on the cusp of change and by using several new techniques, AI is now capable of understanding complex conversations and responding in natural and intelligent ways.

The most popular technique for helping machines understand conversational speech is by using word embeddings, which attempt to encapsulate the “meaning” of a word in a vector after reading massive amounts of text and analyzing how each word appears in various contexts across a dataset. The idea is that words with similar meaning will have similar vectors. Word2Vec & GloVe are currently the most popular word embedding algorithms. But as Sebastian Ruder, Research Scientist at AYLIEN, notes “learning word vectors is like {an image recognition system that} only learning edges.” It works for certain situations where the problem is simple or straightforward, but not when things are complex.

Let’s look at several new techniques that attempt to move beyond Word2Vec’s shallow approach and embed text with meaning in a richer way.

1) Narrow the Scope and Train Intensively

While this technique works, it is more of an “idiot savant” approach, where the bot will be able to converse across a narrow domain quite well but will be really dumb about everything else. This is okay in some situations, but when using this approach, it is especially important that the users know the chatbot is a computer, so that when the bot says something silly the user knows why.

This was a core technique used by Google in its I/O conference demo, when Google Assistant booked an appointment at a hair salon, and then made a restaurant reservation. As Google explained, the training was intensive and narrow in scope. But what would have happened if the human had decided to make small talk and asked, “How about them Red Sox?” Google noted that Google Assistant was not ready to “carry out general conversations,” so the response would probably have been hilarious or embarrassing.

2) Next Generation Word Embeddings

A paradigm shift is occurring within word embeddings by new techniques such as ELMo, ULMFiT, and the OpenAI transformer. As per Sebastian Ruder, if learning word vectors {e.g. Word2vec} is like only learning edges, these approaches are like learning the full hierarchy of features, from edges to shapes to high-level semantic concepts.” In essence these new techniques have a much richer semantic representation of words/sentences and thus enable the bot to understand words in a deeper way.

Possibly even more exciting is the idea that with these newer systems we may be able to build transferable pre-trained universal word/sentence embeddings that we can use with virtually any bot and achieve excellent comprehension and results, which sounds a lot like human intelligence!

3) Use an Ontology and Sentiment detection to Label the Text for Meaning

While word embeddings are one way to embed a text dataset with meaning, data labeling is the tried and true method used in other AI domains such as image recognition. (See our white paper on “Data Labeling Full-Text Datasets for AI Predictive Lift” for more comprehensive treatment of this topic.) The problem with data labeling is that in the past this has been done by humans and thus is very expensive. But automated data labeling for text is now a possibility, using an Entity Ontology and Sentiment Detection.

An entity ontology is like a dictionary and a thesaurus; its job is to define the meaning of words by: a) encoding commonalities between concepts in a specific domain (e.g. both “yellow fever” and “malaria” are “diseases spread by mosquitoes”), and b) encoding how words relate to concepts, when they vary depending upon the context (e.g. that Mercury is sometimes a “metal,” sometimes a “planet” and a sometimes a “Greek god”). Entity ontologies can be created and used to label a dataset with meaning at great cost using humans. But now these tasks can be fully automated. High quality ontologies can be generated using NLP and AI techniques. These ontologies can be further edited by domain experts (“human-in-the-loop”) and then used to label datasets in bulk or in real time (e.g. streaming).

Understanding text also requires a nuanced and micro understanding of sentiment. Document or even sentence level sentiment is essentially useless for AI. For example, “My neighbor’s garden is awesome, the vegetables are really fresh, but they also attract deer, which is how I got Lyme disease.” The bot needs to see the first part of the sentence as positive (e.g. the fresh vegetables produced by my neighbor’s garden are excellent), and the second part of the sentence as negative (e.g. getting Lyme disease because of my neighbor’s garden is awful), rather than as neutral (half good plus half bad).

Labeling datasets with ontologies and sentiment often result in a better chatbot than by using word embedding alone, as the ontology and sentiment detection capture additional meaning allowing the bot to achieve a more human-like understanding of the text.

Losing the Dunce Cap in 2019?

While I cannot be sure these newer techniques will make bots super-smart next month or next year, we do know that they are making conversational AI systems smarter all the time. If you are using these new techniques, we’d love to hear about how it’s working. Or if you want help moving your bot to the head of the class – give us a call.