Command and Response Symbiosis in AI Dialogue Systems

ChatGPT is set to become an interactive generative AI experience. OpenAI revealed that the world’s leading AI chatbot will be able to speak and respond to user queries using a synthesized, presumably AI-generated, voice.

MUO VIDEO OF THE DAY

SCROLL TO CONTINUE WITH CONTENT

Along with its newfound voice, ChatGPT will also be able to respond to and discuss specific images uploaded to it or snapped while using the ChatGPT Android or iOS app. The image recognition feature sounds similar to Google Lens and other apps that use neural networks to detect data and information accurately.

Disclaimer: This post includes affiliate links

If you click on a link and make a purchase, I may receive a commission at no extra cost to you.

OpenAI Gives ChatGPT a Voice

On September 25, 2023, ChatGPT developer OpenAI revealed it would give its world-leading generative AI chatbot a voice. ChatGPT users can speak directly to the chatbot and request it speak back, effectively allowing ChatGPT to converse directly with voice for the first time.

OpenAI’s example clip features a woman asking ChatGPT to create a unique bedtime story, to which ChatGPT duly responds with a female synthesized voice.

According to Wired , the new text-to-speech model was developed in-house. It can generate “human-like” audio from text and a few seconds of sample speech (using the OpenAI Whisper model ) and speak in various tones and styles. You can find a range of voice samples on OpenAI’s blog .

Some companies are already putting OpenAI’s new voice model to use. For example, Spotify is using OpenAI’s text-to-speech model to translate podcasts into different languages, combining ChatGPT’s language translation prowess with its new speaking ability.

ChatGPT’s new text-to-speech model is only available to Plus and Enterprise subscribers using the official Android and iOS apps and is expected to roll out within the next two weeks (starting from September 25, 2023). Furthermore, the new voice feature is limited to English to begin with, though we would expect this to change rapidly.

ChatGPT Can Recognize and Analyze Images and Photographs

The second part of OpenAI’s ChatGPT update is the ability to analyze and talk images uploaded to the tool. The visual image analysis option was featured in the GPT-4 update videos but hasn’t been discussed much since that time (ChatGPT Code Interpreter aside ).

Now, ChatGPT gains functionality similar to Google Lens. You can upload an image to ChatGPT or take a photograph using your smartphone camera in the ChatGPT app, and it will detail the image, adding more context where required.

Calling it “similar to Google Lens” does it an injustice, really. The ability to chat back and forth about the image to gain more information and context makes it extremely useful for a broad range of settings. However, it’s important to note the fine print, with OpenAI making it clear that it has limited ChatGPT’s “ability to analyze and make direct statements about people” for privacy and accuracy reasons. Still, could an OpenAI-powered “Who Is This” tool be in the works for the future? (Let’s hope not!)

Like the new text-to-speech model, OpenAI will roll out image recognition in the next two weeks, though it will be available on all platforms, not just the ChatGPT app.

Privacy, Security, and Other Issues

The implications of a voice-powered ChatGPT are stark. Sure, it’s exciting. However, the ability to create a uniquely synthesized voice using just a short snippet as an example has considerable privacy and security issues. The potential for malicious actors to exploit these tools is enormous, and as with any generative AI tool, once the genie is out of the bottle, it absolutely will not go back in. No amount of AI regulation from governments or thought leaders can turn back the tide.

Even OpenAI’s warning on the topic seems to skirt around the obvious despite mentioning the issues:

However, these capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud. This is why we are using this technology to power a specific use case—voice chat.

Given this is the tip of the iceberg, expect pushback against ChatGPT’s newfound voice, especially once there is a predictable uptick in unsavory headlines claiming ChatGPT is being used to commit fraud and so on.

OpenAI Is Making ChatGPT the Go-To AI App

The more OpenAI adds user-friendly features to ChatGPT, the more it becomes the go-to generative AI app. As the first to reach widespread fame during the initial generative AI boom, ChatGPT still leads the way and is the only app some use, despite competition from the likes of Google Bard (and potentially Google Gemini) and Anthropic’s Claude.

So long as OpenAI can continue to add features that make ChatGPT easier to use, it’ll keep people hooked and push ever closer to its goal of a truly multi-modal AI tool.

SCROLL TO CONTINUE WITH CONTENT

Tech Savvy

Command and Response Symbiosis in AI Dialogue Systems