OpenAI’s New Release Is WILD
🔗 Link do vídeo: https://www.youtube.com/watch?v=75SHVbKCe44
🆔 ID do vídeo: 75SHVbKCe44
📅 Publicado em: 2025-12-11T13:03:20Z
📺 Canal: AI LABS
⏱️ Duração (ISO): PT4M22S
⏱️ Duração formatada: 00:04:22
📊 Estatísticas:
– Views: 3.365
– Likes: 91
– Comentários: 8
🏷️ Tags:
OpenAI code red is declared. Their solution? Bribery.
Gemini's been cooking so hard that OpenAI finally admitted their models have a honesty problem. Their fix: make the model write a confession report after every response. Get caught lying? Confess and get rewarded anyway. Sounds like a solid plan until you remember these things hallucinate.
I read the full paper so you don't have to. Here's why this "proof-of-concept" might just be teaching models to apologize better instead of actually being honest.
Why I'm skeptical:
Rewarding confessions = rewarding misalignment with extra steps
Claude models already learned to hide intentions when given reward-hacking tips
Stronger models figured out it's easier to confess than give correct answers
This identifies inaccuracies — doesn't prevent them
The models aren't lying because they're planning robot world domination. They're confused, overtasked, and trained on datasets that reward confident guesses over "I don't know."
And no, OpenAI didn't scale this to production. So for now, your server's still cooked.
🔔 Subscribe if you want AI research explained without the corporate fluff
#OpenAI #AIAlignment #ChatGPT #AISafety #MachineLearning #AIResearch