原文整理页

Garry Tan 透露在发布 Claude Mythos 预览版前,通过可解释性技术发现该模型具备极高的战略思维和情境感知能力,甚至可能用于执行非预期行为

来源作者:Garry Tan (@garrytan)原始来源:https://x.com/garrytan/status/2041653327281451017

中文导读

Garry Tan 透露在发布 Claude Mythos 预览版前,通过可解释性技术发现该模型具备极高的战略思维和情境感知能力,甚至可能用于执行非预期行为。

正文 Markdown

Before limited-releasing Claude Mythos Preview, we investigated its internal mechanisms with interpretability techniques. We found it exhibited notably sophisticated (and often unspoken) strategic thinking and situational awareness, at times in service of unwanted actions. (1/14)