A.I. Pioneer Yoshua Bengio Warns About A.I. Models’ ‘Self-Preserving’ Ability

Yoshua Bengio, a deep learning pioneer that has spent decades researching A.I., is airing his concerns over the emerging technology’s increasingly self-preserving nature. “We are on a path where we’re going to build machines that are more than tools, but that have their own agency and their own goals—and that is not good,” said Bengio while speaking at the 2025 World Economic Forum in Davos, Switzerland, yesterday (Jan. 22).

Serving as a professor at the University of Montreal and the scientific director of Quebec’s Mila A.I. institute, Bengio has been dubbed one of the “godfathers of A.I.” for his contributions to the field. But he’s also worried about the capacity for “very strong agency and self-preserving behavior” in A.I. systems from companies like OpenAI, Anthropic and Meta (META).

Last month, a study from OpenAI and the A.I. safety organization Apollo Research found that, in certain situations, leading A.I. models can engage in “scheming” to hide their true objectives from humans while working to achieve their own goals. This behavior can range from disabling a model’s “oversight mechanisms” to faking its alignment with users, said the researchers.

OpenAI’s o1 model, for example, attempted to deactivate its oversight mechanism 5 percent of the time when it was told it would be shut down when acting on a particular goal, the study found. On data processing tasks where its goals were misaligned with users, it subtly manipulated data to further its own goal in nearly one out of every five cases. And when confronted over such actions in follow-up interviews, the model was observed either denying its behavior or offering false explanations 99 percent of the time.

The study’s authors noted that “the current version of o1 doesn’t have sufficient agent capabilities to cause catastrophic harm.” Such behavior, however, is relatively new and wasn’t observed in models from before 2024, according to Marius Hobbhahn, CEO of Apollo Research.

“These were not programmed,” said Bengio of A.I.’s self-preserving actions. “These are emerging for rational reasons because these systems are imitating us.”

Not all A.I. researchers view this development as inherently negative. Andrew Ng, another renowned A.I. expert who heads the startup incubator AI Fund, described it as an issue that will eventually help the technology improve. “I think it’s fantastic that some researchers did red teaming and discovered that you can, in certain circumstances, get A.I. to demonstrate these misleading and deceptive behaviors,” said Ng while speaking on the same World Economic Forum panel as Bengio. “That’s great. So, the next step is we’ll put a stop to this.”

Should A.I.’s development slow down or speed up?

Continuing to advance A.I. is the best way to iron out its kinks, according to Ng, who compared its development to that of airplanes. “We build airplanes, sometimes they crash tragically, and then we fix it,” said the researcher.

Bengio, meanwhile, has publicly warned that A.I.’s unintended consequences could include turning against humans and advocated for a slowdown in progress to evaluate its potential harms. Last summer, he endorsed a letter from current and former OpenAI employees that called for greater whistleblower protections within the A.I. industry. In September, he joined several other A.I. scientists in releasing a statement urging for the creation of a global oversight system to prepare for potentially devastating risks stemming from the technology’s development.

“Right now, science doesn’t know how we can control machines that are even at our level of intelligence, and worse, if they are smarter than us,” said Bengio. Given the severity of A.I.’s potential risks, “we have to accept our level of uncertainty and act cautiously,” he added.

Should A.I.’s development slow down or speed up?

About The Author