Can AI-based spoken dialog systems complement chatbots and better understand the meaning of queries?
Generating a text based on a few keywords impresses when you first experience this phenomenon. But in fact, it’s relatively easy to create an article about a topic that already exists in text form on the Internet, says Jörg Rebell of Spitch, a developer of speech and text dialog systems. “However, it is much more difficult to understand what a person is saying and, above all, what they mean by it.”
Impressive texts, but partly wrong
Jörg Rebell explains the difference with an example: “ChatGPT delivers a pretty good text to the request ‘Write an article about Marbella from a tourist’s point of view’, which is amazing because you were not used to something like that before. But if you think about it more closely, it quickly becomes clear how simple it basically is. The result is little more than an abbreviated Wikipedia article. When it comes to the question ‘Which recent discoveries of the James Webb Space Telescope can I tell my nine-year-old child?’ the Google AI Bard already falls short: The result reads plausibly, but is in part simply wrong – although this query is also comparatively simple.
Things only get difficult when, for example, someone calls an insurance company and tells them that their cat has jumped onto their neighbor’s sofa and scratched it. The caller doesn’t want to talk about cats, sofas or his neighbor, but wants to report a claim that he thinks his liability insurance should pay. Spitch’s speech dialog system understands this. The difference is obvious: With ChatGPT and Bard, all the relevant terms such as ‘Marbella, tourist, James Webb space telescope, nine-year-old child’ are dropped, but with Spitch, all the words used by the caller fail to convey what he means, although he doesn’t even say it. From these examples, it’s clear why the artificial intelligence used by Spitch must be much more advanced than ChatGPT or Bard.”
Speech dialog system complements text-based dialog system
For this reason, Jörg Rebell sees voice dialog systems as a complement to ChatGPT and Bard. Spitch can understand many, if not all, calls. A speech dialog system faces the challenge of recognizing the intention – the so-called “intent” – i.e., of deducing from “cat,” “sofa” and “neighbor” that it is about liability insurance.
Some speech dialog systems manage this Intend recognition with a rate of over 85 percent. To do so, however, they must be trained on this company before being used in a particular company. This includes capturing the technical vocabulary commonly used in the industry and company in question. In addition, thousands of calls have to be analyzed to filter out what callers typically want and what words or phrases they use to express it.
Jörg Rebell: “The decisive factor for speech dialog systems is the first step, which is to understand the caller and what he or she is interested in. This is a unique selling point compared to all generative AI systems. In this context, we speak of conversational AI.”