In this episode, we check in with CEO & Co-founder of Cognigy Philipp Heltewig to learn how to build an end-to-end enterprise conversational system. Delving into natural language understanding, conversational channels and backend integrations, Phil lays out what’s needed to make conversational automation work.
Phil, welcome to the show. We gave a little intro in the beginning, but I'm sure our audience would love to hear from you directly. So can you just tell us briefly who you are and what you do?
Yeah, sure. Thanks for having me on the show. My name is Phil. I'm one of the two co-founders of Cognigy.ai, which is a conversational AI platform, and we're here today to talk a little bit more about conversational automation and what we're doing.
I'm thrilled to have you. Actually, first, are you starting to see any kind of different ways of discussing this among your customers, in terms of what automation in general or conversational automation means for some of these businesses these days, because of the pandemic situation we are in?
Very much so, actually. Before the crisis there was already a lot of enterprises looking at conversational automation, automation in general, I guess, but there was a lot of innovation projects trying out things, what does it even mean, Artificial Intelligence, how can we apply it, and so on and so forth. And with the crisis coming about, I think the level of urgency has been massively increased, right? What we're seeing now is really large enterprises around the world from pretty much any industry, you name it, transportation, financial industry, insurance, government, are now taking a serious look at these kinds of automation technologies, because there is a number of pressures now hitting them.
Firstly, the number of customer queries has dramatically increased. Well, we're using the word customer here, representative for customers, employees, or citizens, depending on the context that you're in. And so the number of queries has increased dramatically, which put a lot of pressure on call centers or on contact centers.
And now on the other hand, you had the Corona crisis and the measures that were put in by the governments empty out a large part of the contact centers as well, because of course you couldn't have a tightly packed contact center any longer where you had agents sitting next to agent next to agent next to agent. So you had double pressure coming in and conversational automation or automation technology in general, I think is looked at as one of the technologies that can really make a difference and really helping out with these types of issues that various enterprises around the world are now experiencing.
I understand, Phil, conversational automation, conversation AI is just your standard vocabulary, but sometimes although we have a technically-inclined audience, sometimes it appears the word automation has different meanings for different people. And I think we are seeing the potentiality of it is vastly different now. Can you just unpack it a bit, what it really means and, especially, bring it to the whole conversational aspect of it?
Yeah, you're right, I mean, we're so stuck in this industry that sometimes we use all these new terms without even thinking about them much. So what I actually did was I went to look up official definitions of these various words that we're using, and it was quite interesting what came out. If you look at the word automation, the Merriam Webster, it says the technique of making an apparatus, a process or a system operate automatically. Of course, that brings about the question what does automatic even mean? And automatic means working by itself with little or no direct human control, right? So in our sense here, it's automating a process, so making a process work with little or no direct human intervention. There's many things that are automations. If you look at a car, especially a self-driving car, that's also an automation, right? It works by itself with little or no direct human intervention. You tell it where to go and then it goes there, at least that's the vision.
Now of course in IT, it's a little bit different. I came across a definition by a VMware, which is IT automation is the process of creating software and systems to replace repeatable processes and reduce manual intervention. And I think this is really what it is about. When we're talking about automation in the software space, it is really about identifying repeatable processes, right? Processes that don't require a lot of human creativity and ingenuity, but very repeatable processes that are carried out tens, hundreds, thousands of times in a very similar fashion. And we can automate those using software so that there is as little human intervention needed as necessary.
Now, most of the time these days when you talk about automation, you talk about what is called robotic process automation or RPA, right, which is a part of automation that evolved from workflow automation. In RPA, you have so-called bots and bots, they are like metaphorical robots, right. They're not physical beings, but they're pieces of software. And I guess they call them bots because they act autonomously and they can do things on their own, kind of like a robot, right? So in RPA you have these software robots and the special thing is that they watch a human perform a task. Like let's say your task is that every week you receive a spreadsheet with a thousand lines, which is your stock inventory in your factories and then you have to take this and enter it manually into an ERP system. Now what the RPA system can do, it can watch you do this once for one row in your spreadsheet, and then you can kick off the robot and it goes off and it does it for all your thousand rows automatically within a matter of seconds, right? It moves the mouse for you, it clicks for you, it copies and pastes.
So it automates a very repeatable process because of course entering the first line is really no different to entering the second line in your ERP. So that's what is called RPA, and this is, nowadays, the most common type of automation. Now, the thing that RPA lacks is really the interface to the customer. So RPA is mostly automation inside the enterprise, right? Whereas, it kind of lacks the interface to the customer and that interface that we have to customers is conversational, right? We are interacting with our customers mostly in a conversational manner, whether that is in a store or whether that is on an email or on the phone. And this is where conversational automation comes in. So conversational automation is really the next step in the automation revolution and the revolution of automating these repeatable tasks.
So we have similar capabilities as in RPA, so we can interface with existing systems, ERP systems, CRM systems, any kind of backend system that holds customer data or data that is relevant to customers. And we add the ability to understand human language and understand human language nowadays at a very, very high precision level. And because we're mixing the automation capability with the ability to understand human language, this type of automation is actually mostly used in service automation. So that can be customer service or employee service or citizen service, where those customers want quick answers wherever they are.
I hope that gives a bit of an overview of where we're coming from with automation and how we are progressing from workflow automation over RPA, to now into the field that is called conversational automation.
It really does, because you kinda laid out the process how consumers experience, some form of automation, now how it's progressing and having different kinds of interfaces, and how things are getting interesting. So let me pick it up from there from end user's point of view. So now that you start having these interfaces that’s interacting with a human in the human-like way, now what are we really trying to accomplish that we weren't able to do before?
The goal of conversational automation is to really make it fast and frictionless for customers to interact with our enterprise. Because we live in a world now where customers, or really anyone, wants instant answers, right? Back when I was a kid and I wanted to know something about whatever, I had to wait for a day, then go to the library and look it up. If I wanted to know this something now, let's say, who is the president of Zimbabwe or something like this, I can just go on Google and I find it within 20 seconds, right? So we are used to getting answers immediately, right? Anything, hey, has the new season of X been released? Hey, what's the sequel to this and that book that I read? We want answers immediately. We live in a much faster time than we used to live before.
And I guess this expectation of being able to get answers and as such service instantly and in a frictionless fashion, is also now in regards to customer service, right? So if you want customer service, your expectation is that you get that instantly. Amazon is a good example. I don't know if you ever had to deal with Amazon customer service, which I personally consider very good. If I have a problem with anything I bought on Amazon, I can get someone or I can interact with someone pretty much instantly, at least during business hours, right? That is good service, you want service fast and frictionless. Gone are the days where, I don't know, your coffee machine broke and then you had to take it to the city and then try to talk with someone about this and wait in line and it takes half the day to solve this.
Now, the two aspects are fast and frictionless. The frictionless pieces that you can get service wherever you are and in any fashion that you want, right, and nowadays what we all have as we all have applications like WhatsApp or Facebook messenger and so on and so forth. So I really don't want to figure out the way that the enterprise I'm dealing with is delivering service. Do I have to send them an email? Do I have to call them, or maybe do they have a chat channel or something like this? No, I want this to be frictionless. You know what, I'm using Line messenger, why can't I just contact you on Line messenger?
So it needs to be fast and frictionless and conversational automation does exactly that, right? It lives on these different channels like WhatsApp, Facebook messenger, Line messenger, web chat, telephony, wherever, and it understands human language. So for example, if I need a copy of my last mobile bill, the easiest would be to just open up WhatsApp, type, I need a copy of my last bill send that to my operator, and pretty much instantly I will be provided with that bill. I'm authenticated through the device already, they know who I am because of my phone number, and I'm getting provided a copy of my invoice. Quite honestly, I have no clue how to retrieve a copy of my last mobile bill at the moment. That probably is a service offering that does that, but I don't know how it works.
So again, fast and frictionless is what we're trying to achieve and conversational automation with its ability to understand what humans are saying or writing is what really brings that about.
In essence, in a way we are doing what we always have done, getting information and doing these things and asking something to do something, and all those things. But it sounds like what we’re using is changing, whether it's a chat app, like WhatsApp and other apps. And that jives with how human interactions have changed, not just automation. Although we live in a world where using things like chat apps are just everyday life, it's pretty darn recent, if you think about it. So I see these different technologies that are being developed and they are converging and coming together, but the cue is coming from the end user side of doing things, they have changed, how they communicate all this. Do you see this taking off strongly? Is there something about the readiness of both ends that is making this even more compelling?
Very much so. I mean, there's two aspects that really, I guess influence why this is really taking off at the moment. The first one is the one that you mentioned, it's the ubiquity of channels, the availability of channels to actually communicate with a bot, right. Ten years ago, the only way to communicate with a bot and most people don't know that, but chat bots already existed at that time. They weren't really good, but they existed. The only way was to go to some website and chat with them, because we didn't really have chat apps on our phones, right. Whereas nowadays, everyone is carrying probably a number of communication channels with them all at all times, right. So we have, for example, I use WhatsApp. I have that with me almost 24/7, right, and this is something that just wasn't the case 10 or 12 years ago.
So it is a lot easier to reach customers and also for customers to reach us if they need help. And that, again, that's the whole frictionless component, right? So this is the first aspect, the availability of channels. The second one is, of course, the quality of artificial intelligence. Artificial intelligence technologies have advanced significantly over the past years. Almost every year we're seeing a new breakthrough when it comes to things like natural language understanding. So the ability for machines to actually decipher human language and assign meaning to the words that users have written or said. And this new quality in artificial intelligence actually gives enterprise the confidence that when they deploy this type of technology, that it will increase customer satisfaction rather than be looked at as a negative by the customers.
So you have two things coming together, right? You have the availability of the channels, and you have artificial intelligence software now at a quality where they can perform at a human level, or even sometimes better than a human. Bringing these two things together, I think are the reasons why these types of automations are really taking off nowadays.
I understand Cognigy work with voice as well as text-based chats and things of that nature. So where are we at in terms of the whole availability versus the precision that is required? I noticed my Siri is getting a little smaller, but I don't know if it's there yet, and is text genuinely easier to decipher and get to a certain level? Where are we at here?
When you talk about natural language understanding and the technology that assigns meaning to human words, there is a number of technologies that have to all perform at a high quality level in order for this to work. Now, if you work with chat, the human writes something, and this text is then received by the AI engine and analyzed, and then a meaning is assigned to it. So, for example, if I say, "I need a copy of my last bill", then a meaning is assigned to it that exists in the context of that bot, right? For example, bill request or something like that. So this is relatively easy. Now, the difficulty is that when humans write text, they make typos or we all have fun with autocorrect sometimes, right? Sometimes the system makes a typo for us. And then maybe I don't write, "I need a copy of my last bill", but maybe I write, "I need a puppy of my last fill."
And then the system has to be smart enough to still understand, okay, this could probably still mean that they want a copy of their last bill. So that's where the artificial intelligence portion comes in when it comes to natural language understanding. So this is in text. Now, you deal with voice bots by putting another technology in the front end, which is called speech-to-text or ASR, automatic speech recognition. So this technology is again an artificial intelligence-based technology. It receives a voice stream and it translates that voice stream into text. Now, the good thing is that the text that comes out is grammatically and spelling wise, always correct, you cannot make typos in that text. The bad thing is that the engine could have misunderstood a word. So for example, I could say, "I need to update my account", and maybe what came out was "I need to update my dear sound", or something like this. So grammatically it's correct, it doesn't really make a lot of sense though.
So in order for a system to work well, you have to have all the different components that are in the system work well. But again, also with automatic speech recognition or speech-to-text, if you use your Siri now and you dictate something in your native tongue, I can almost guarantee that 99% of what is recognized is actually correct. Right? I'm using this all the time when I quickly want to send a longer message than I just dictated and the results are good. Now you could say, well, why isn't Siri quite there yet, as you put it. This is a philosophical question, what does quite there yet mean, right?
There is two types of artificial intelligences from a philosophical perspective. There is the so-called narrow AI's. Those are AI's that can do one thing very well. So let's say I'm the bot that works at the mobile phone operator and the things I can do well are handle invoice requests, handle address change requests and things like that. So like the top 10 requests that customers might have that make up 80% of my customer service volume. Now this bot will not be able to answer who was Barack Obama's wife or when is Donald Trump's birthday, because it doesn't have to, right. It is not there to answer these kinds of things, it is there to help you with your customer service issues that you might have with that mobile phone operator.
So that's a narrow AI, and then you have a more general AI, and I guess Siri and Alexa and Google assistant and so on, they are attempts at creating that, which is an AI that you can ask anything, that you can talk to about anything, right? And for most of the conversational automations we're seeing in customer service, these are more narrow AI's that can do only what they're built to do in that very close domain. They can not have a chat where you tell them something about what happened to you last Sunday, and the next week they'll remember that, because that's not what they're made for, that's not what the type of automation is made for.
Now I understand this, that the whole general assistance versus some that are very task oriented. And I think there is a whole expectation right now because as an end use when it comes to voice, I experienced general assistance. So that really crystallizes the experience I'm having, at least.
We talked about customer service cases a lot. And I think it's true, inside of enterprises, these technologies are first deployed in many cases in customer service first. Other than the fact, as a business person we talk about reducing costs a lot at a customer service center, but is there any other reasons why it tends to start there?
Well, I think, I mean, you already mentioned the main reason, right, it's ROI. And we now have systems at the quality level where the ROI is actually positive and it is much cheaper to have a conversation with a bot than it is with a human agent. So if you look at the cost of a bot conversation, if it's a text conversation, it probably sits somewhere around the 10 to 15 cents per conversation. If it's a voice conversation it probably sits more around 30 cents. Now, if you take the average cost per call in a call center, this can be anything between five and 25, $30 fully-loaded cost. People don't realize the cost of these call centers actually costs, right? Imagine if you have a cost per call that is $5 and you have 1 million calls a month, then we're talking about $5 million in costs just for that.
Now, if you take the 30 cents per call for a voice bot call, that is 300,000 at 1 million. So you pay $300,000 instead of $5 million, if you could solve all issues. And the reason why customer service use cases are the first use cases to be looked at is because of the sheer volume of interactions that you have in that scenario. There's several scenarios we're seeing that, of course, the one leading by a far stretch is customer service. But you also have employee service like a human resources and so on. But again, let's say you are a large company you might have a million customers, but you only have 10,000 employees. So the volume just can't be that high. The cost of serving these employees through human agents just isn't as high as serving your customers.
And the second thing is that in customer service, usually you have high repeatability in processes, right? So if you take this mobile phone operator, you can probably identify 10 issues that people call about, and those 10 issues make up 85% of all calls that are coming in. Right? So it's always very similar. I need to upgrade my plan. I don't have reception and this and this and that. So it's very similar questions that are coming in so they are easy to automate. But in general, when a customer asks us and say, "Okay, we are really interested in this type of technology, how should we identify what process we should automate with conversational automation?" Then we say, "Okay, look at what are the processes that happen the most that are well suited for automation", because there are also processes that are not suited well for automation, and what's the current process cost. Let me just multiply all of that and you come up with a model and customer service is usually the clear winner in that.
Now I want to give an example for a process that can be automated very easily and a process that can not be automated very easily, because if you just go like, okay, what are my most expensive processes, it doesn't mean that you can necessarily automate them. So we are working with a large frozen pizza producer, actually in Germany, or baking goods producer, and they have a process where people ask for which supermarket can I buy this and their pizza in? This is easy to automate because you just need to figure out what product is it and which zip code are they in and then you can give them the supermarkets they can buy the product in if you have that data from some ERP system. So that's easy to automate.
But usually these conversations also only last one minute on the phone with an agent so you don't save that much. Now, there are other conversations that lasts for 20 to 30 minutes, and those are conversations that they're having with old ladies that are calling to talk about recipes for baking. Now, of course, you cannot build a bot to automate those, right? If you automated those with a bot it would not increase customer satisfaction, because they probably also calling to talk about recipes, but they're also calling to call and have a chat, right. And if that is part of your brand image and how you position your brand, then you of course need to maintain that and don't try to automate that away, right?
So again, what are your top queries? How well suited are they for automation? What is the current process cost for them? And what's the volume in general that they are happening and when you calculate it in that way, then most likely you will find use cases in customer service that are most suited for conversational automation.
That is actually very clear. I mean, to an extent where you can actually run some cost saving per use case, actually comparing and getting to the analysis part where what is really worth bothering with. And I was also curious about whether it's availability of data, perhaps, because customer service happens to have, you know, we all have a CRM or something to make these interactions, smarter, whereas if you're dealing with straightforward or mixed marketing data, it may be more difficult, I wasn't sure. But I think that internally from an enterprise’s point of view, just comparing use cases in terms of the cost related to each makes a lot of sense.
And actually you mentioned some of the intangible things too, whether that is brand wise. Whether it's important for the brand to keep it human-based interaction or not. So the decision making is never easy, but it really helps me to think that, oh, it's not just a one-dimensional ROI calculation, it is about taking different use cases and comparing and choosing where it makes sense. And then, okay, does this make sense in a bigger picture? Is this something we should automate or not. So, that really helps a lot.
And there is a second aspect to that, which we haven't touched on yet. Me, for example, as a customer, if I'm going back to this example that I want a copy of my last mobile bill, I literally don't care if I'm interacting with a human or a bot, whoever can get me that information quicker, please help me, right, I really don't care. Now, there are customers who want to deal with a human only, like I described earlier, right, for various reasons. Some people just don't want to deal with the machine they want to deal with a human. So in a good conversational automation solution, we always have the ability to fall back to a human operator. We don't force the human to communicate with that automation. They can, if they feel comfortable with it. And if they just care about quickly getting an answer, if they want to talk to a human, that ability must exist.
I think that is a very important aspect that is sometimes overlooked. When an enterprise has come to us and say, "We're just going to automate all of these human agents away." And we say, "Well, that's not how it works", right? You're augmenting your agent false with a conversational automation that can maybe take the pressure off so that the agents can then focus on the more difficult cases that need human interactions, human intervention, right? And this is why it's super important to always have a live chat, or if it's on the phone, a phone-based operator solution going hand-in-hand with the automation that usually sits at the front end, capturing those first interactions with a customer, then deciding, can I handle that myself or do we have to hand this over to a human operator?
Just a quick follow-up question, because I was thinking about new interactive buttons on chat apps and those different bells and whistles and feature sets that somehow consumers gotten used to from their web-based experiences, and adding those along with the natural language processing to reduce friction that you talked about. Sometimes I wonder, when I try to type something, talk with a chat bot, I sometimes say, "Can you just give me specific options that you understand? Then I'll pick one." Sometimes I get into that mode because sometimes I need help, some kind of help to simplify my choice, because I'm obviously not talking to a human being. So how do you think about augmenting conversational interfaces with those different tools that we created for intuitive UI?
I think it goes hand in hand with what we discussed earlier about general artificial intelligence. There's somehow this romantic sci-fi kind of view that bots have to be like us. They're like artificial beings that I can talk to and they must understand everything I'm saying and so on. But let's be realistic here, we're talking about software, we're talking about statistical algorithms that are working their magic under the hood, which is what we call artificial intelligence these days. Now these bots do not and will not understand everything a human says at this point which can, of course, lead to frustration on the side of the user if we are forcing the user to express everything by a natural language.
Now what I'm telling the customers I'm speaking to is if you can guide the user, why not? Because again, for the user, it's about the experience being fast and frictionless. Quite honestly, if I get a menu at the start, which says, "Hey, how can I help you?" And one of the buttons says, mobile bill request, well, I'm just going to click that. I'm not going to type into, "I need a copy of my mobile bill", or maybe I'll type it in a strange way and the bot doesn't get it and asked me again. Well, if I can use a button, I'm going to use it.
So on the channels that are giving us those abilities, like web chat, Facebook messenger, I really think that we should use those types of little widgets that can help guide the user experience. Now, when it comes to voice interactions, it of course becomes a lot more difficult, right? The whole thing that we call conversational design changes when you're moving to voice interactions. I'll give an example. So say we're talking to a bot for a clothing retailer, and I'll say something like, "Hey, show me your t-shirt specials". Now on a WhatsApp or Facebook messenger, I might get pictures of the newest t-shirts and maybe on messenger, I can then click, I want to buy this and that, okay.
Now, if I have a voice interaction with that same bot, and I'm calling on the phone line, and I'm saying, "What are the t-shirts specials", of course, it can't show me anything. Right. I somehow need to present that in a different way or maybe I segue onto a conversational text channel where I say, "Hey, I'll send you an MMS" or whatever you can do, or, "I can send you an email with the specials", right. So the way that you interact with customers on a voice channel and a text channel can be very different. But if the text channel gives you these kind of little aides that can help users be guided around the experience, then I think those should most definitely be used.
Yeah. I think interesting things will happen more and more now because, you know, even WhatsApp is offering those interactive buttons now, and I think coming all together to reduce friction I think that’s definitely the direction to go.
Part 2 of the interview with Philipp Heltewig will be released in two weeks, following this release of Part 1.