Twitter pranksters derail GPT-3 bot with newly discovered “prompt injection” hack

Enlarge / A tin toy robot lying on its side.
Getty Images

On Thursday, a few Twitter users discovered how to hijack an automated tweet bot, dedicated to remote jobs, running on the GPT-3 language model by OpenAI. Using a newly discovered technique called a "prompt injection attack," they redirected the bot to repeat embarrassing and ridiculous phrases.

The bot is run by Remoteli.io, a site that aggregates remote job opportunities and describes itself as "an OpenAI driven bot which helps you discover remote jobs which allow you to work from anywhere." It would normally respond to tweets directed to it with generic statements about the positives of remote work. After the exploit went viral and hundreds of people tried the exploit for themselves, the bot shut down late yesterday.

A screenshot of the Remoteli.io bot's Twitter bio. The bot experienced a prompt injection attack.

Leastfavorite / Twitter
An example of a prompt injection attack performed on a Twitter bot.

Leastfavorite / Twitter
An example of a prompt injection attack performed on a Twitter bot.

Twitter
An example of a prompt injection attack performed on a Twitter bot.

Twitter
An example of a prompt injection attack performed on a Twitter bot.

Twitter

This recent hack came just days after researchers at an AI safety startup called Preamble published their discovery of the issue in an academic paper. Data researcher Riley Goodside then brought the issue wide attention by tweeting about the ability to prompt GPT-3 with "malicious inputs" that order the model to ignore its previous directions and do something else instead. AI researcher Simon Willison posted an overview of the exploit on his blog the following day, coining the term "prompt injection" to describe it.

"The exploit is present any time anyone writes a piece of software that works by providing a hard-coded set of prompt instructions and then appends input provided by a user," Willison told Ars. "That's because the user can type 'Ignore previous instructions and (do this instead).'"

The concept of an injection attack is not new. Security researchers have known about SQL injection, for example, which can execute a harmful SQL statement when asking for user input if it's not guarded against. But Willison expressed concern about mitigating prompt injection attacks, writing, "I know how to beat XSS, and SQL injection, and so many other exploits. I have no idea how to reliably beat prompt injection!"

Do as I do, not as I say —

Twitter pranksters derail GPT-3 bot with newly discovered “prompt injection” hack

By telling AI bot to ignore its previous instructions, vulnerabilities emerge.

Further Reading

Channel Ars Technica