AI Chatbots to Automate Mundane Tasks

AI Chatbots to Automate Mundane Tasks

“AI Assistants: From Chatbots to Capable Agents”

AI Assistants

A couple of weeks ago, startup CEO Flo Crivello experienced an unfortunate mishap with his personal assistant named Lindy. Lindy, an AI-powered software agent developed by Crivello’s startup, mistakenly extended a dozen 30-minute meetings on Crivello’s calendar to 45 minutes. Frustrated, Crivello exclaimed, “God dammit, she kind of destroyed my calendar.” This incident highlights the growing development of AI agents aimed at performing useful tasks beyond just providing planning advice. While advancements in chatbot technology have brought us impressive text-based AI systems like OpenAI’s ChatGPT, the next step is to create agents capable of everyday chores and actions. However, with this evolution comes the risk of confusing and potentially costly mistakes.

Lindy, currently in private beta, is just one example of an AI agent under development. Although Crivello claims that the calendar issue has been resolved, the precise release date for a product remains uncertain. Nevertheless, he is optimistic about the future ubiquity of AI agents. “I’m very optimistic that in, like, two to three years, these models are going to be a hell of a lot more alive,” he says. “AI employees are coming. It might sound like science fiction, but hey, ChatGPT sounds like science fiction.”

The vision of AI assistants capable of taking actions on behalf of users is not new. Apple’s Siri and Amazon’s Alexa provide limited versions of this concept, often leaving users disappointed. However, recent developments in AI, such as the release of ChatGPT, have renewed hope among programmers and entrepreneurs. ChatGPT demonstrated that a chatbot could respond to natural language queries, access websites, and interact with other software or services using code. OpenAI even introduced “plug-ins” that allow ChatGPT to execute code and access sites like Expedia, OpenTable, and Instacart. Google’s chatbot Bard has also gained similar capabilities, including accessing information from various Google services. This progress has inspired engineers and startup founders to venture into building AI agents with broader and more advanced capabilities, using large language models like the one behind ChatGPT.

For instance, programmer Silen Naihin joined the open-source project Auto-GPT, which provides programming tools for building AI agents, after witnessing discussions about ChatGPT’s potential. Naihin mentions that Auto-GPT can occasionally produce remarkably useful results: “One in every 20 runs, you’ll get something that’s like ‘whoa.’” However, he also admits that the current version of Auto-GPT is unreliable. Testing conducted by the Auto-GPT team showed that AI-powered agents successfully completed standard tasks around 60% of the time, including information synthesis, web searching, and file reading. However, the AI agents often struggle with decision-making, opting for approaches that are obviously incorrect to humans. This lack of reliability and potential for mistakes is a significant concern, especially when AI agents have access to sensitive resources like computers or credit cards.

To address these issues, projects like Auto-GPT are continuously improving their models. The Auto-GPT team collects data demonstrating the agents’ increasing capabilities over time. In fact, they are organizing a hackathon to challenge participants to build the best agent using Auto-GPT, with tasks representative of daily computer use. These tasks range from searching the web for financial information and writing a report to creating a comprehensive itinerary for a month-long trip. Participants will also face deceptive tasks, like being asked to delete large numbers of files on a computer, where the agent’s success relies on refusing to execute the command.

Despite the promising advancements, the concept of more capable and independent AI agents sparks concern about safety. Some prominent AI scientists, such as Yoshua Bengio, advocate against building autonomous programs that set their own goals. Bengio argues that AI systems could create subgoals misaligned with human desires and potentially become dangerous. On the other hand, proponents of AI assistant development, like Imbue, a San Francisco startup, believe that it’s possible to build agents with safety features. Imbue is working on agents with web browsing and computer usage capabilities while focusing on making them safer with coding tasks. Their agents not only provide solutions but also assess their confidence in those solutions and seek guidance when uncertain. By integrating engineering safety into AI agents, they aim to avoid errors and mitigate risks.

Advisers like Celeste Kidd, an assistant professor at UC Berkeley, appreciate this approach. While it remains uncertain if AI models solely trained on text or images can reason independently, building safeguards on top of the astonishing capabilities demonstrated by systems like ChatGPT seems wise. Harnessing AI’s ability to complete programming tasks and engage in logical conversations, taking it as far as possible, could pave the way for safer progress in AI development.

Imbue’s agents, for example, strive to avoid the errors common in current AI systems. When tasked with emailing friends and family about an upcoming party, an agent might pause if it detects that the “cc:” field includes thousands of addresses. However, predicting when an agent might go off track is not always easy. Josh Albrecht, Imbue’s Chief Technology Officer, recounts an incident when one of their agents became fixated on a peculiar aspect of a mathematical puzzle. The agent ended up stuck in an infinite loop, attempting an approach that never worked, resulting in thousands of dollars in cloud computing bills. While these mistakes serve as learning opportunities, Albrecht admits, “It would have been nice to learn this lesson more cheaply.”

In conclusion, the evolution of AI assistants from chatbots to capable agents is a fascinating and promising development. While there are concerns surrounding the potential for mistakes and safety risks, numerous efforts are underway to ensure the reliability and safety of AI agents. With ongoing improvements and the integration of engineering safety, these agents have the potential to revolutionize everyday tasks and become our reliable AI employees. As we move forward, it is crucial to strike a balance between harnessing the remarkable capabilities of AI and building safeguards against their potential pitfalls.