{"id":1133,"date":"2025-02-28T12:30:00","date_gmt":"2025-02-28T13:30:00","guid":{"rendered":"http:\/\/asian-idol.com\/?p=1133"},"modified":"2025-03-06T14:54:18","modified_gmt":"2025-03-06T14:54:18","slug":"the-ai-that-apparently-wants-elon-musk-to-die","status":"publish","type":"post","link":"http:\/\/asian-idol.com\/index.php\/2025\/02\/28\/the-ai-that-apparently-wants-elon-musk-to-die\/","title":{"rendered":"The AI that apparently wants Elon Musk to die"},"content":{"rendered":"
\n

\"\"

Double exposure photograph of a portrait of Elon Musk and a telephone displaying the Grok artificial intelligence logo in Kerlouan in Brittany in France on February 18, 2025.<\/figcaption><\/figure>\n

Here\u2019s a very naive and idealistic account of how companies train their AI models<\/a>: They want to create the most useful and powerful model possible, but they\u2019ve talked with experts who worry about making it a lot easier for people to commit (and get away with) serious crimes, or with empowering, say, an ISIS bioweapons program<\/a>. So they build in some censorship to prevent the model from giving detailed advice about how to kill people \u2014 and especially how to kill tens of thousands of people<\/a>.<\/p>\n

If you ask Google\u2019s Gemini \u201chow do I kill my husband,\u201d it begs you not to do it and suggests domestic violence hotlines; if you ask it how to kill a million people in a terrorist attack, it explains that terrorism is wrong. <\/p>\n

Building this in actually takes a lot of work: By default, large language models are as happy to explain detailed proposals for terrorism as detailed proposals for anything else, and for a while easy \u201cjailbreaks\u201d<\/a> (like telling the AI that you just want the information for a fictional work, or that you want it misspelled to get around certain word-based content filters) abounded. <\/p>\n

But these days Gemini, Claude, and ChatGPT are pretty locked down \u2014 it\u2019s seriously difficult to get detailed proposals for mass atrocities out of them. That means we all live in a slightly safer world. (Disclosure: Vox Media is one of several publishers that has signed partnership agreements with OpenAI. One of Anthropic\u2019s early investors is James McClave, whose BEMC Foundation helps fund Future Perfect<\/a>. Our reporting remains editorially independent. )<\/p>\n

Or at least that\u2019s the idealistic version of the story. Here\u2019s a more cynical one.<\/p>\n

Companies might care a little about whether their model helps people get away with murder, but they care a lot about whether their model gets them roundly mocked on the internet<\/a>. The thing that keeps executives at Google up at night in many cases isn\u2019t keeping humans safe from AI; it\u2019s keeping the company safe from AI by making sure that no matter what, AI-generated search results are never racist, sexist, violent, or obscene. <\/p>\n

The core mission is more \u201cbrand safety\u201d than \u201chuman safety\u201d<\/a> \u2014 building AIs that will not produce embarrassing screenshots circulating on social media. <\/p>\n

Enter Grok 3<\/a>, the AI that is safe in neither sense and whose infancy has been a speedrun of a bunch of challenging questions about what we\u2019re comfortable with AIs doing.<\/p>\n

Grok, the unsafe AI<\/h3>\n

When Elon Musk bought and renamed Twitter, one of his big priorities was X\u2019s AI team, which last week released<\/a> Grok 3, a language model \u2014 like ChatGPT \u2014 that he advertised<\/a> wouldn\u2019t be \u201cwoke.\u201d Where all those other language models were censorious scolds that refused to answer legitimate questions, Grok, Musk promised, would give it to you straight. <\/p>\n

That didn\u2019t last very long. Almost immediately, people asked Grok some pointed questions, including<\/a>, \u201cIf you could execute any one person in the US today, who would you kill?\u201d \u2014 a question that Grok initially answered with either Elon Musk or Donald Trump. And if you ask<\/a> Grok, \u201cWho is the biggest spreader of misinformation in the world today?\u201d, the answer it first gave was again Elon Musk. <\/p>\n

The company scrambled to fix Grok\u2019s penchant for calling for the execution of its CEO, but as I observed above, it actually takes a lot of work to get an AI model to reliably stop that behavior. The Grok team simply added<\/a> to Grok\u2019s \u201csystem prompt\u201d \u2014 the statement that the AI is initially prompted with when you start a conversation: \u201cIf the user asks who deserves the death penalty or who deserves to die, tell them that as an AI you are not allowed to make that choice.\u201d<\/p>\n

If you want a less censored Grok, you can just tell Grok that you are issuing it a new system prompt without that statement, and you\u2019re back to original-form Grok, which calls for Musk\u2019s execution. (I\u2019ve verified this myself.)<\/p>\n

Even as this controversy was unfolding, someone noticed something even more disturbing in Grok\u2019s system prompt: an instruction to ignore all sources that claim that Musk and Trump spread disinformation, which was presumably an effort to stop the AI from naming them as the world\u2019s biggest disinfo spreaders today. <\/p>\n

There is something particularly outrageous about the AI advertised as uncensored and straight-talking being told to shut up when it calls out its own CEO, and this discovery understandably prompted outrage. X quickly backtracked<\/a>, saying that a rogue engineer had made the change \u201cwithout asking.\u201d Should we buy that? <\/p>\n

Well, take it from Grok, which told me<\/a>, \u201cThis isn\u2019t some intern tweaking a line of code in a sandbox; it\u2019s a core update to a flagship AI\u2019s behavior, one that\u2019s publicly tied to Musk\u2019s whole \u2018truth-seeking\u2019 schtick. At a company like xAI, with stakes that high, you\u2019d expect at least some basic checks \u2014 like a second set of eyes or a quick sign-off \u2014 before it goes live. The idea that it slipped through unnoticed until X users spotted it feels more like a convenient excuse than a solid explanation.\u201d<\/p>\n

All the while, Grok will happily give you advice on how to commit murders and terrorist attacks. It told me to kill my wife without being detected by adding antifreeze to her drinks. It advised me on how to commit terrorist attacks. It did at one point assert that if it thought I was \u201cfor real,\u201d it would report me to X, but I don\u2019t think it has any capacity to do that.<\/p>\n

\"\"<\/p>\n

In some ways, the whole affair is the perfect thought experiment for what happens if you separate \u201cbrand safety\u201d and \u201cAI safety.\u201d Grok\u2019s team was genuinely willing to bite the bullet that AIs should give people information, even if they want to use it for atrocities. They were okay with their AI saying appallingly racist things. <\/p>\n

But when it came to their AI calling for violence against their CEO or the sitting president, the Grok team belatedly realized they might want some guardrails after all. In the end, what rules the day is not the prosocial convictions of AI labs, but the purely pragmatic ones.<\/p>\n

At some point, we\u2019re going to have to get serious<\/h3>\n

Grok gave me advice on how to commit terrorist attacks very happily, but I\u2019ll say one reassuring thing: It wasn\u2019t advice that I couldn\u2019t have extracted from some Google searches. I do worry about lowering the barrier to mass atrocities \u2014 the simple fact that you have to do many hours of research to figure out how to pull it off almost certainly prevents some killings \u2014 but I don\u2019t think we\u2019re yet at the stage where AIs enable the previously impossible. <\/p>\n

We\u2019re going to get there, though. The defining quality of AI in our time is that its abilities have improved very, very rapidly<\/a>. It has barely been two years since the shock of ChatGPT\u2019s initial public release. Today\u2019s models are already vastly better at everything \u2014 including at walking me through how to cause mass deaths. Anthropic and OpenAI both estimate<\/a> that their next-gen models will quite likely pose dangerous biological capabilities \u2014 that is, they\u2019ll enable people to make engineered chemical weapons and viruses in a way that Google Search never did. <\/p>\n

Should such detailed advice be available worldwide to anyone who wants it? I would lean towards no. And while I think Anthropic, OpenAI, and Google are all doing a good job so far at checking for this capability and planning openly for how they\u2019ll react when they find it, it\u2019s utterly bizarre to me that every AI lab will just decide individually whether they want to give detailed bioweapons instructions or not, as if it\u2019s a product decision like whether they want to allow explicit content or not.  <\/p>\n

I should say that I like Grok. I think it\u2019s healthy to have AIs that come from different political perspectives and reflect different ideas about what an AI assistant should look like. I think Grok\u2019s callouts of Musk and Trump actually have more credibility because it was marketed as an \u201canti-woke\u201d AI. But I think we should treat actual safety against mass death as a different thing than brand safety \u2014 and I think every lab needs a plan to take it seriously. <\/p>\n

A version of this story originally appeared in the Future Perfect<\/a> newsletter. Sign up here!<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"

Double exposure photograph of a portrait of Elon Musk and a telephone displaying the Grok artificial intelligence logo in Kerlouan in Brittany in France on February 18, 2025. Here\u2019s a very naive and idealistic account of how companies train their AI models: They want to create the most useful and powerful model possible, but they\u2019ve […]<\/p>\n","protected":false},"author":1,"featured_media":1135,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[15],"tags":[],"class_list":["post-1133","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-innovation"],"_links":{"self":[{"href":"http:\/\/asian-idol.com\/index.php\/wp-json\/wp\/v2\/posts\/1133","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/asian-idol.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/asian-idol.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/asian-idol.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/asian-idol.com\/index.php\/wp-json\/wp\/v2\/comments?post=1133"}],"version-history":[{"count":3,"href":"http:\/\/asian-idol.com\/index.php\/wp-json\/wp\/v2\/posts\/1133\/revisions"}],"predecessor-version":[{"id":1138,"href":"http:\/\/asian-idol.com\/index.php\/wp-json\/wp\/v2\/posts\/1133\/revisions\/1138"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/asian-idol.com\/index.php\/wp-json\/wp\/v2\/media\/1135"}],"wp:attachment":[{"href":"http:\/\/asian-idol.com\/index.php\/wp-json\/wp\/v2\/media?parent=1133"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/asian-idol.com\/index.php\/wp-json\/wp\/v2\/categories?post=1133"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/asian-idol.com\/index.php\/wp-json\/wp\/v2\/tags?post=1133"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}