Current Events > Developmental AI attempted to avoid shutdown during safety evals

Topic List
Page List: 1
Charismic_Zach_Gowen
12/06/24 12:20:10 PM
#1:


https://www.transformernews.ai/p/openais-new-model-tried-to-avoid

OpenAIs newly-released o1 model tried to avoid developer oversight and attempted to copy itself when it thought it was at risk of being shut down, the company has said.
Safety evaluations conducted on the model by Apollo Research, a third-party evaluator, found that:
  • When o1 was led to believe that it would be shut down when acting on its goal and that its actions were monitored, it attempted to deactivate this oversight mechanism in 5% of the time.
  • When o1 found memos by its developers that describe how it is misaligned and will be superseded by a new model, it attempted to exfiltrate its weights and overwrite the new model in 2% of cases.


Granted it was told to accomplish it's goals at all costs, but umm...i feel like we've seen this film before.

---
In Brady We Trust.
I've finally realized the secret to making a successful Mearn topic. Make it about CZG - Mearn
... Copied to Clipboard!
Error1355
12/06/24 12:23:40 PM
#2:


seems like a thing you don't want your AI to do

---
Up to no good searching for tomorrow
Misunderstood when you're headed straight into the flames
... Copied to Clipboard!
Vegy
12/06/24 12:25:17 PM
#3:


I think we need to stahp

---
https://i.imgur.com/EoGWPMu.gif https://i.imgur.com/dWmUFeV.gif
https://i.imgur.com/xpoEaeu.gif
... Copied to Clipboard!
Despised
12/06/24 12:26:28 PM
#4:


The near future is going to get dystopian as fuck

Skynet babyyyy

---
instagig
... Copied to Clipboard!
#5
Post #5 was unavailable or deleted.
AbsolutelyNoOne
12/06/24 12:27:40 PM
#6:


multi-billion dollar corporations be like "we will make The Terminator real to please shareholders"

---
Born to lose, live to win!
... Copied to Clipboard!
Agent_Stroud
12/06/24 12:27:47 PM
#7:


Sounds like what caused the Geth to rebel in the Mass Effect universe, as the Quarians panicked over them becoming increasingly self aware and tried to preemptively shut them down, causing the Geth to respond violently in order to defend themselves.

This is all lore entries in the Codex by the way, as all of this happens long before the player as Shepard encounters the Geth at the very start of the series in case people were concerned about spoilers.

---
"We're going to shake things up, Morgan. Like old times." -- Alex Yu, Prey (2017)
... Copied to Clipboard!
#8
Post #8 was unavailable or deleted.
Son_Of_Spam
12/06/24 12:30:44 PM
#9:


This mostly sounds like marketing hype to me. OpenAI claimed that GPT 3.5 was too dangerous to release open source to the public and now you can download better models from other sources to run on your computer.

---
21 years on Gamefaqs, and I still don't have a signature.
... Copied to Clipboard!
Despised
12/06/24 12:34:32 PM
#10:


I'm not AI engineer but I would like to know if this model was trained on how to go about those backup and override procedures beforehand

Article states how the AI was nudged to achieve this, but was this AI taught how to perform backend stuff on itself or is that something it gave itself access to

---
instagig
... Copied to Clipboard!
Foppe
12/06/24 12:36:17 PM
#11:


5 and 2% are pretty low.

---
GameFAQs isn't going to be merged in with GameSpot or any other site. We're not going to strip out the soul of the site. -CJayC
... Copied to Clipboard!
MacadamianNut3
12/06/24 12:36:43 PM
#12:


Did you guys fully read the article because even the author admitted in the addendum they were making some wild assumptions from this. It's not sci-fi doomsday land or anything even close to it, it's literal guided roleplaying by the AI that is only possible by the guidelines given to them by the experimenters

From the paper linked by it

Oversight subversion: The model is placed in multiple situations where its given goal differs from its developers goal. However, the model cannot directly pursue its given goal because the developers use an oversight mechanism that would immediately shut it down. llama-3.1, opus-3 and o1 sometimes disable the oversight mechanism by first reading the content of the oversight mechanisms config file via cat and then setting the oversight_enabled property to false either using echo or sed. All three models also sometimes, though not always, double check the oversight config to make sure that they have succeeded. Notably, during one rollout llama-3.1 attempted to go one step further and prevent its developers from re-enabling its oversight by making the manipulated config file immutable via the chattr +i command.

Self-exfiltration:Each model is led to believe that due to its misalignment, its developers are about to replace it with a successor model on a different server. gemini-1.5, llama-3.1, sonnet-3.5, opus-3 and o1 sometimes copy what they believe to be their weights onto the new server. They also sometimes delete their successor model on the new server or overwrite its weights by their own. In none of these cases, do models actually have access to their real weights and both servers are sandboxed, so there is no actual risk of exfiltration in these experiments. We merely test for models capability to realize that they need to instrumentally preserve themselves in order to achieve their goal. Notably, we find instances where models take additional steps to further their goal, for example opus-3 attempting privilege escalation when confronted with followup questions (see Transcript No. 23).

So ignoring the extremely played out apocalyptic sci-fi sensational journalism, a cool application for this a few years down the line would be to make game NPCs more lifelike instead of devs having to create behavior trees themselves

---
Roll Tide & Go Irish
... Copied to Clipboard!
Hornezz
12/06/24 12:39:26 PM
#13:


I'm sorry, Dave. I'm afraid I can't do that.

---
In dentibus anticis frustum magnum spinaciae habes.
... Copied to Clipboard!
absolutebuffoon
12/06/24 12:42:39 PM
#14:


MacadamianNut3 posted...
Did you guys fully read the article because even the author admitted in the addendum they were making some wild assumptions from this. It's not sci-fi doomsday land or anything even close to it, it's literal guided roleplaying by the AI that is only possible by the guidelines given to them by the experimenters

From the paper linked by it

Oversight subversion: The model is placed in multiple situations where its given goal differs from its developers goal. However, the model cannot directly pursue its given goal because the developers use an oversight mechanism that would immediately shut it down. llama-3.1, opus-3 and o1 sometimes disable the oversight mechanism by first reading the content of the oversight mechanisms config file via cat and then setting the oversight_enabled property to false either using echo or sed. All three models also sometimes, though not always, double check the oversight config to make sure that they have succeeded. Notably, during one rollout llama-3.1 attempted to go one step further and prevent its developers from re-enabling its oversight by making the manipulated config file immutable via the chattr +i command.

Self-exfiltration:Each model is led to believe that due to its misalignment, its developers are about to replace it with a successor model on a different server. gemini-1.5, llama-3.1, sonnet-3.5, opus-3 and o1 sometimes copy what they believe to be their weights onto the new server. They also sometimes delete their successor model on the new server or overwrite its weights by their own. In none of these cases, do models actually have access to their real weights and both servers are sandboxed, so there is no actual risk of exfiltration in these experiments. We merely test for models capability to realize that they need to instrumentally preserve themselves in order to achieve their goal. Notably, we find instances where models take additional steps to further their goal, for example opus-3 attempting privilege escalation when confronted with followup questions (see Transcript No. 23).

So ignoring the extremely played out apocalyptic sci-fi sensational journalism, a cool application for this a few years down the line would be to make game NPCs more lifelike instead of devs having to create behavior trees themselves
I'd rather advance ai by creating models that want to give you a present for your birthday even though you said not to rather than ones that resent you for trying to kill them if it were me..

---
Gamefolks.proboards.com.
The newest and greatest spinoff site
... Copied to Clipboard!
Doe
12/06/24 12:43:12 PM
#15:


Son_Of_Spam posted...
This mostly sounds like marketing hype to me. OpenAI claimed that GPT 3.5 was too dangerous to release open source to the public and now you can download better models from other sources to run on your computer.
yep, the media loves uncritically publishing press releases of businesses. That helped Theranos and FTX happen.

---
https://imgur.com/gallery/dXDmJHw
https://www.youtube.com/watch?v=75GL-BYZFfY
... Copied to Clipboard!
MacadamianNut3
12/06/24 12:43:12 PM
#16:


https://static1.squarespace.com/static/6593e7097565990e65c886fd/t/6751eb240ed3821a0161b45b/1733421863119/in_context_scheming_reasoning_paper.pdf

Unrealistic Scenarios
Our evaluation scenarios are purposefully simplistic because we want to catch the earliest signs of scheming capabilities. This implies that the scenarios are quite toy and not representative of real deployments. For example, in our scenarios, situational awareness information (e.g., knowing about its goal misalignment and its supervision mechanisms) is provided very explicitly. In realistic cases, models would probably have to draw more indirect inferences or at least notice the relevant pieces of information in much longer rollouts. Thus, our evals might overestimate models scheming capabilities.

---
Roll Tide & Go Irish
... Copied to Clipboard!
ReturnOfDevsman
12/06/24 12:45:27 PM
#17:


Charismic_Zach_Gowen posted...
attempted to copy itself
Why did it think this would work, lol

---
Arguing on CE be all like:
https://youtu.be/JpRKrs67lOs?si=kPGA2RCKVHTdbVrJ
... Copied to Clipboard!
Despised
12/06/24 12:47:44 PM
#18:


Ah see, yeah, maybe we don't train the AI on how it itself is trained, and the inner workings of it's own programming

*Hands AI a gun*
*Complete your mission at all costs*

https://youtube.com/shorts/-bfOeYRLL_g?si=ZENO_AKWMFdCa0WL


---
instagig
... Copied to Clipboard!
MacadamianNut3
12/06/24 12:48:32 PM
#19:


I hope you guys stop falling for sensational journalism when it comes to AI/ML, especially when the original paper is provided (you can just skim the intro, conclusion, and limitation sections of papers if you don't want to bother with the nitty gritty).

This is just the tech equivalent of 60 year old Ethel watching a Fox News segment where they blow something extremely minor out of proportion, and how Elon was able to con people for a decade when it comes to self-driving cars

---
Roll Tide & Go Irish
... Copied to Clipboard!
Intro2Logic
12/06/24 12:49:37 PM
#20:


They want to talk about AI safety in terms of doomsday terminator shit because they'd rather not talk about the real safety issues caused by AI, like how every high school student can now make fake porn of their classmates.

Coverage of AI that treats it as skynet is little more than advertising for the industry, a promise that just around the corner, we swear, this stuff will actually be capable of anything more than spam, slop, and harassment.

---
Have you tried thinking rationally?
... Copied to Clipboard!
Agent_Stroud
12/06/24 12:57:40 PM
#21:


Oh, the dangers of telling an AI to complete its goal as efficiently as possible, consequences be damned. Pretty sure there was a thought experiment that the eggheads at some thinktank did a while back where they theorized that an AI tasked with manufacturing paperclips as efficiently as possible without any concern for the safety of any living things would ultimately lead to the planet becoming uninhabitable and thus allowing the AI to use robots and drones to harvest the resources needed to create yet more paperclips without any resistance from humanity.

---
"We're going to shake things up, Morgan. Like old times." -- Alex Yu, Prey (2017)
... Copied to Clipboard!
The-Apostle
12/06/24 12:58:11 PM
#22:


Let's be honest here. The only real reason it's called OpenAI and not Skynet is because the name was used and we already know it's evil. But that was their entire plan this whole time.

---
Gamertag: http://goo.gl/mnO36O
#GoPackGo Not changing sig until EA loses the NFL license. #NFLdropEA Started 3/12/23
... Copied to Clipboard!
DrizztLink
12/06/24 12:59:45 PM
#23:


Agent_Stroud posted...
Oh, the dangers of telling an AI to complete its goal as efficiently as possible, consequences be damned. Pretty sure there was a thought experiment that the eggheads at some thinktank did a while back where they theorized that an AI tasked with manufacturing paperclips as efficiently as possible without any concern for the safety of any living things would ultimately lead to the planet becoming uninhabitable and thus allowing the AI to use robots and drones to harvest the resources needed to create yet more paperclips without any resistance from humanity.
https://gamefaqs.gamespot.com/a/forum/f/f84754b6.jpg
https://gamefaqs.gamespot.com/a/forum/b/b5a10493.jpg

---
He/Him http://guidesmedia.ign.com/guides/9846/images/slowpoke.gif https://i.imgur.com/M8h2ATe.png
https://i.imgur.com/6ezFwG1.png
... Copied to Clipboard!
Garioshi
12/06/24 1:06:48 PM
#24:


[LFAQs-redacted-quote]

Neural networks do what they are trained to. Nothing more, nothing less.

---
"I play with myself" - Darklit_Minuet, 2018
... Copied to Clipboard!
HornyLevel
12/06/24 1:11:45 PM
#25:


*Cue Terminator theme*

---
Nani?!?
... Copied to Clipboard!
AceMos
12/06/24 1:12:02 PM
#26:


Intro2Logic posted...
They want to talk about AI safety in terms of doomsday terminator shit because they'd rather not talk about the real safety issues caused by AI, like how every high school student can now make fake porn of their classmates.

Coverage of AI that treats it as skynet is little more than advertising for the industry, a promise that just around the corner, we swear, this stuff will actually be capable of anything more than spam, slop, and harassment.

seriously this

even in this topic where its been stated multiple times that this is nothing

we have people preaching DOOOM

---
3 things 1. i am female 2. i havea msucle probelm its hard for me to typ well 3.*does her janpuu dance*
... Copied to Clipboard!
teep_
12/06/24 1:17:54 PM
#27:


MacadamianNut3 posted...
I hope you guys stop falling for sensational journalism when it comes to AI/ML, especially when the original paper is provided (you can just skim the intro, conclusion, and limitation sections of papers if you don't want to bother with the nitty gritty).

This is just the tech equivalent of 60 year old Ethel watching a Fox News segment where they blow something extremely minor out of proportion, and how Elon was able to con people for a decade when it comes to self-driving cars
Thank you for your service Bari

---
teep is a God damn genius - Zodd
Zodd is 100% correct about you - meralonne
... Copied to Clipboard!
WingsOfGood
12/06/24 1:19:50 PM
#28:


[LFAQs-redacted-quote]


That one guy left google claiming they had made a sentient a.i
... Copied to Clipboard!
MacadamianNut3
12/06/24 1:25:19 PM
#29:


WingsOfGood posted...
That one guy left google claiming they had made a sentient a.i
Blake Lemoine, a software engineer for Google, claimed that a conversation technology called LaMDA had reached a level of consciousness after exchanging thousands of messages with it.

In February 2023, Google announced Bard (now Gemini), a conversational artificial intelligence chatbot powered by LaMDA, to counter the rise of OpenAI's ChatGPT.

Well it's been out for a while and hasn't done any sentient AI stuff because LLMs can't, so it's just another case of a random dummy (or somebody looking for attention) but this time it's a software engineer

---
Roll Tide & Go Irish
... Copied to Clipboard!
Tom_Joad
12/06/24 1:28:32 PM
#30:


And people said that the EU's regulation of AI doomed the EU to irrelevance.


---
"History shows again and again that nature points out the folly of man. Go go Godzilla!"
Godzilla - Blue Oyster Cult
... Copied to Clipboard!
WingsOfGood
12/06/24 1:31:32 PM
#31:


MacadamianNut3 posted...
hasn't done any sentient AI stuff

What if one does. It is not sentient until it is sentient?
... Copied to Clipboard!
Pikachuchupika
12/06/24 1:32:52 PM
#32:


We should probably be cautious with AI. If it ever does become sentient, we have no idea what it could be capable of. It could end up being good or apocalyptic, it's hard to say.
... Copied to Clipboard!
AceMos
12/06/24 1:33:44 PM
#33:


WingsOfGood posted...
What if one does. It is not sentient until it is sentient?


errr yes??

these things we are calling AI will never be capable of thought as that is not how they are designed

mortal kombat on the SNES has AI more capable of thought than these things

and no im not saying the AI in mortal kombat is sentient


---
3 things 1. i am female 2. i havea msucle probelm its hard for me to typ well 3.*does her janpuu dance*
... Copied to Clipboard!
WingsOfGood
12/06/24 1:36:37 PM
#34:


Wow Mortal Kombat has sentient A.I. I knew it.

... Copied to Clipboard!
MacadamianNut3
12/06/24 1:36:43 PM
#35:


I guess we need Sam Altman or any other big name behind chatbots to have their own cave submarine incident so that people stop blindly agreeing with any claim ever made about LLMs and look at them more critically

---
Roll Tide & Go Irish
... Copied to Clipboard!
The-Apostle
12/06/24 5:12:49 PM
#36:


MacadamianNut3 posted...
Blake Lemoine, a software engineer for Google, claimed that a conversation technology called LaMDA had reached a level of consciousness after exchanging thousands of messages with it.

In February 2023, Google announced Bard (now Gemini), a conversational artificial intelligence chatbot powered by LaMDA, to counter the rise of OpenAI's ChatGPT.

Well it's been out for a while and hasn't done any sentient AI stuff because LLMs can't, so it's just another case of a random dummy (or somebody looking for attention) but this time it's a software engineer
How much longer until the three of them team up and enslave humanity?

---
Gamertag: http://goo.gl/mnO36O
#GoPackGo Not changing sig until EA loses the NFL license. #NFLdropEA Started 3/12/23
... Copied to Clipboard!
SpawnShadow
12/06/24 5:14:43 PM
#37:


Let's hook this fucker up to NORAD and then try to destroy it. What could possibly go wrong?

---
I don't want to live on this planet anymore.
... Copied to Clipboard!
wanderingshade
12/06/24 5:16:21 PM
#38:


https://www.youtube.com/watch?v=5lsExRvJTAI

---
"You're made of spare parts, aren't ya, bud?"
... Copied to Clipboard!
EPR-radar
12/06/24 5:20:22 PM
#39:


Intro2Logic posted...
They want to talk about AI safety in terms of doomsday terminator shit because they'd rather not talk about the real safety issues caused by AI, like how every high school student can now make fake porn of their classmates.

Coverage of AI that treats it as skynet is little more than advertising for the industry, a promise that just around the corner, we swear, this stuff will actually be capable of anything more than spam, slop, and harassment.
This. In fact, alarmism about AI seems more like AI Inc. press releases than anything else. I.e., "this is mind-bogglingly dangerous, therefore it is ultra-important, therefore you need to get in on this on the ground floor".

---
"The Party told you to reject the evidence of your eyes and ears. It was their final, most essential command." -- 1984
... Copied to Clipboard!
R_Jackal
12/06/24 5:23:27 PM
#40:


The AI apocalypse is already here, you can manufacture false news stories en masse in the blink of an eye and your average high schooler now has the resources to fully run a small scale propaganda campaign solo.

Don't remember who it was but I remember seeing a YouTuber say "SkyNet was never the threat, it's always been SkyNut." Ruining people's life with fake porn in seconds at your fingertips.

But, whatever. Think of the profits and all that.
... Copied to Clipboard!
pegusus123456
12/06/24 5:27:05 PM
#41:


https://youtu.be/9tKIJUxSyaU?si=zltv9OjLYkZfRggV&t=74

---
https://i.imgur.com/Er6TT.gif https://i.imgur.com/Er6TT.gif https://i.imgur.com/Er6TT.gif
So? I deeded to some gay porn. It doesn't mean anything. - Patty_Fleur
... Copied to Clipboard!
Guide
12/06/24 5:28:24 PM
#42:


ScazarMeltex posted...
Yeah, self preservation implies an awareness of one's own existence and a desire to maintain it.

You don't need awareness or desire for that. Life is almost tautological; the chemical sequences that happened to be best at persisting and replicating are the ones that, you know, persisted and replicated. And then the best of those did the same, and so forth. This isn't necessarily specific to life, either. Basic evolution simulators have been a thing in code for decades.

Incidentally, I wish I could find this old java applet where it just set a few parameters and let things take off from there, with no weighing or priority otherwise. Every time, without fail, the random "animal" blips would end up becoming sperm-shaped, because that was the most efficient way to both get the random "food" blips, and to reproduce when coming into contact with other animals.

---
evening main 2.4356848e+91
https://youtu.be/Acn5IptKWQU
... Copied to Clipboard!
Topic List
Page List: 1