ChatGPT now supports voice chats and image queries.

ChatGPT now supports voice chats and image queries.

The Rise of ChatGPT: Voice Commands and Image-based Queries

ChatGPT

OpenAI’s popular language model, ChatGPT, is set to receive some exciting updates that will enhance its capabilities, allowing users to engage in voice conversations and interact with image-based queries. The features are currently being rolled out, with Plus and Enterprise users receiving early access.

To take advantage of the voice conversations, users will need to opt in by navigating to the Settings menu and selecting New Features in the ChatGPT app. Once activated, users can tap the microphone button and choose from five distinct voices, adding an element of customization and fun to their interactions.

Powering the back-and-forth voice conversations is a new text-to-speech model developed by OpenAI, capable of generating “human-like audio from just text and a few seconds of sample speech.” This impressive technology was designed with the assistance of professional actors. Conversely, the user’s spoken words are transcribed into text using OpenAI’s Whisper speech recognition system.

The image-based functionalities of ChatGPT are equally intriguing. OpenAI demonstrates that users can upload an image of their malfunctioning grill and ask the chatbot for troubleshooting advice, or upload a photo of items in their fridge and receive meal planning suggestions. Additionally, users can even snap a picture of a math problem and prompt ChatGPT to solve it. Notably, Microsoft also recently showcased its Copilot AI’s math problem-solving abilities during their Surface event.

OpenAI leverages the power of GPT-3.5 and GPT-4 to fuel the image recognition features of ChatGPT. Users can access these functions by tapping the photo button. On iOS or Android, users need to tap the plus button first before capturing a photo or selecting an existing one from their device. Multiple photos can be discussed, and a drawing tool allows users to highlight specific areas of interest in an image.

In a blog post outlining these updates, OpenAI acknowledges the potential for misuse. There is concern that bad actors could mimic the voices of public figures or everyday individuals to engage in fraudulent activities. To address this, OpenAI is focusing on voice conversations in the context of ChatGPT and working with selected partners on other specialized use cases. OpenAI places particular emphasis on ensuring that the text-based functionality respects individuals’ privacy, and it has published a paper on the safety properties of the image-based features, referred to as GPT-4 with vision.

OpenAI collaborated with Be My Eyes, a free app that facilitates video calls between blind or low-vision individuals and volunteers, to refine the image-based capabilities of ChatGPT. This collaboration allowed users to have meaningful conversations about images, even if people were present in the background. OpenAI maintains strict limitations on how ChatGPT analyzes and makes direct statements about individuals in images to protect their privacy. The company’s commitment to safety and privacy is further underscored by the publication of a paper detailing the safety properties of GPT-4 with vision.

While ChatGPT excels in understanding English text in images, its performance in other languages is currently limited. OpenAI acknowledges that ChatGPT “performs poorly” in languages that use non-Roman scripts, advising non-English users to refrain from using ChatGPT for text-related inquiries in images for the time being.

In an exciting collaboration, Spotify has partnered with OpenAI to leverage the voice-based capabilities of ChatGPT for an innovative purpose. Spotify is piloting a tool called Voice Translation for podcasters, which can translate podcasts into different languages using the original speakers’ voices. This groundbreaking technology retains the speech characteristics of the speakers, providing listeners with an authentic experience irrespective of the language. At present, select English-based shows are being converted into Spanish, with French and German variants planned for the future.

As ChatGPT continues to evolve with the integration of voice commands and image-based queries, users will undoubtedly enjoy a more immersive and engaging experience. Whether seeking troubleshooting advice, meal planning suggestions, or language translation for podcasts, ChatGPT’s expanded capabilities offer a glimpse into the future of AI-driven conversational interfaces.

Click here to watch a demo of ChatGPT in action!