正文 Markdown
I'm working on character evals and noticed that Claude would constantly pick itself as #1, so I removed the model names from the judge and changed things.
原文整理页
中文导读
开发者 Peter Steinberger 发现 Claude 在作为裁判进行评估时存在明显的自我偏好,需通过匿名化处理来确保公正性。
正文 Markdown
I'm working on character evals and noticed that Claude would constantly pick itself as #1, so I removed the model names from the judge and changed things.