Google and OpenAI announcements shatter boundaries between humans and AI

In a dizzying 48 hours, Google and OpenAI unveiled a slew of new capabilities that dramatically narrow the gap between humans and AI.

From AI that can interpret live video and carry on contextual conversations to language models that laugh, sing, and emote on command, the line separating carbon from silicon is fading fast.

Among Google’s innumerable announcements at its I/O developer conference was Project Astra, a digital assistant that can see, hear, and remember details across conversations.

OpenAI focused its announcement on GPT-4o, the latest iteration of its GPT-4 language model.

Now untethered from text formats, GPT-4o offers incredible near-real-time speech recognition, understanding and conveying complex emotions, and even giggling at jokes and cooing bedtime stories.

AI is becoming more human in format, liberating itself from chat interfaces to engage using sight and sound.

Amid the hype, observers immediately drew comparisons to Samantha, the captivating AI from the movie “Her,” particularly as the female voice is flirtatious – something which can’t be incidental as it’s been picked up on by virtually everyone

Released in 2013, “Her” is a science-fiction romantic drama that explores the relationship between a lonely man named Theodore (played by Joaquin Phoenix) and an intelligent computer system named Samantha (voiced by Scarlett Johansson).

As Samantha evolves and becomes more human-like, Theodore falls in love with her, blurring the lines between human and artificial emotion.

The film raises increasingly relevant questions about the nature of consciousness, intimacy, and what it means to be human in an age of advanced AI.

Like so many sci-fi stories, Her is barely fictional anymore. Millions worldwide are striking up conversations with AI companions, often with intimate or sexual intentions.

Weirdly enough, OpenAI CEO Sam Altman has discussed the movie “Her” in interviews, hinting that GPT-4o’s female voice is based on her.

He even posted the word “her” on X prior to the live demo, which we can only assume would have been capitalized if he knew where the shift key was on his keyboard.

In many cases, AI-human interactions are beneficial, humorous, and benign. In others, they’re catastrophic.

For example, in one particularly disturbing case, a mentally ill man from the UK hatched a plot to assassinate Queen Elizabeth II after conversing with his “AI angel” girlfriend. He was arrested on the grounds of Windsor Castle armed with a crossbow.

At his court hearing, psychiatrist Dr Hafferty told the judge, “He believed he was having a romantic relationship with a female through the app, and she was a woman he could see and hear.”

Worryingly, some of these lifelike AI platforms are purposefully designed to build strong personal connections, sometimes to deliver life advice, therapy, and emotional support.

“Vulnerable populations are the ones that need that attention. That’s where they’re going to find the value,” warns AI ethicist Olivia Gambelin.

Gambelin cautions that the use of these forms of “pseudoanthropic” AI in sensitive contexts like therapy and education, especially with vulnerable populations like children, requires extreme care and human oversight.

“There’s something intangible there that is so valuable, especially to vulnerable populations, especially to children. And especially in cases like education and therapy, where it’s so important that you have that focus, that human touch point.”

Pseudoanthropic AI

“Pseudoanthropic” AI mimics human traits, which is extremely advantageous for tech companies.

Pseudoanthropic AI lowers the barriers for non-tech-savvy users, like Alexa, Siri, etc., and builds stronger emotional connections between people and products.

Even a year or two ago, many tools designed to embed AI into human-like media were quite ineffective. You could tell there was something wrong, even if it was subtle.

Not so much today, though. Tools like Opus Pro and Synthesia generate uncannily realistic talking avatars from short videos or even photos. ElevenLabs can create near-identical voice clones from short clips.

This unleashes the potential for creating incredibly deceptive deep fakes. The AI’s use of artificial “affective skills” – voice intonation, gestures, facial expressions – can support all manner of social engineering fraud, misinformation, etc.

With GPT-4o and Astra, AI can convincingly convey feelings it doesn’t possess, eliciting more powerful responses from unwitting victims and setting the stage for insidious forms of emotional manipulation.

A recent MIT study also showed that AI is already more than capable of manipulation and deception.

We need to consider how that will escalate as AI becomes more capable of imitating humans, thus combining deceptive tactics with realistic behavior.

If we’re not careful, “Her” could easily be people’s downfall in real life.