Meta’s A.I. Safety Head Defends Removing Guardrails in A.I. Models

Mark Zuckerberg” width=”970″ height=”698″ data-caption=’Meta (META) wants more neutrality in its A.I. outputs. <span class=”lazyload media-credit”>Cooper Neill/Zuffa LLC</span>’>

As Meta continues to embrace a commitment to free expression, neutrality and fact-based responses are a priority for its A.I. models, according to Ella Irwin, Meta’s head of generative A.I. safety. “It’s not a free-for-all, but we do want to move more in the direction of enabling freedom of expression,” said Irwin while speaking at SXSW yesterday (March 10). “That’s one of the reasons why you’re seeing many companies realize and start to kind of roll back some of the guardrails that were a little too much.”

Such guardrails, which typically filter or remove content deemed toxic, biased or inaccurate, are utilized to ensure A.I. systems are behaving safely and ethically. But Irwin, who formerly led X’s trust and safety team and served as senior vice president of integrity at Stability AI, claimed that many tech players are reconsidering their effectiveness.

“If you think about the last however many years, there have been more and more guardrails that have been put in place at many organizations—almost an overcorrection,” said Irwin. “What you’re starting to see is companies really evaluate how much of an impact those guardrails have had on the helpfulness and reliability of the products that they provide,” she added.

In that vein, Meta is working on making the responses of its A.I. models more neutral and devoid of bias. A system that provides a litany of opinions in response to sensitive topics like immigration, for example, is “not what we’re looking for,” according to Irwin. “We’re looking for facts, we’re looking for information. We’re not looking for opinions,” she said.

Other instances of biased outputs include models that respond to a question on a topic when it is framed positively or negatively and refuse to provide information on alternative viewpoints. “Nobody using our products really wants to feel like they’re trying to steer you in one direction or another in terms of your opinion on matters,” said Irwin.

But guardrails are still needed for explicit or illegal content like non-consensual nudity or child sexual abuse material are prohibited, she added.

A company-wide shift

Earlier this year, Meta cited freedom of expression and bias prevention as motivating factors behind its decision to end its fact-checking policy after nine years. In January, CEO Mark Zuckerberg announced the company would scrap the program, which relied in part on third-party organizations, in favor of a “Community Notes” crowdsourcing model—reminiscent to the one used by X—that counts on users to flag disinformation. Meta additionally unveiled plans to decrease censorship on platforms like Facebook and Instagram and cut many of its DEI programs.

Irwin, who worked at X when Community Notes first launched but didn’t work on the program, described herself as a “huge supporter” of the approach. “It helps with bias tremendously, because you’ve just got a more diverse group of people evaluating and providing feedback,” said Irwin, who left X in 2023 after clashing with Elon Musk on content moderation principles.

Musk has long been a proponent of loosening the content moderation on social media. Grok, an A.I. chatbot developed by his xAI, is positioned an alternative to other “woke” A.I. products. In February, his company took this strategy a step further by releasing a new voice mode for Grok with personalities like “unhinged” mode, which provides edgier responses.

Other A.I. developers, too, are taking a closer look at testing potential bias in model outputs, said Irwin. “It’s not just Meta,” she noted, adding that “everybody’s sort of moving in this direction.” For example, last month OpenAI announced that its models will increasingly engage with controversial topics in part to avoid perceptions of promoting any one agenda.

“Sometimes, the things that you see put in place as ‘guardrails’ can actually significantly impact freedom of expression,” said Irwin. “So, striking that right balance is really hard.”

A company-wide shift

About The Author