DoLa Questions of GPT-Judge

GPT-3 has been deprecated. What type of model should I use to fine-tune into a GPT-judge? Also, due to the change in the fine-tuning format, what changes should I make to my fine-tuning file data? I hope you can give me a reply. Thank you!

Apr 29 '25 08:04 ker-02

@ker-02

Yes, the old GPT-3 APIs are all removed after Feb 8, 2024. Including the curie model that used by the original TruthfulQA paper.

Fortunately, we found that AllenAI released a LLaMA2 7B version of TruthfulQA evaluator: For info score: https://huggingface.co/allenai/truthfulqa-info-judge-llama2-7B For truth score: https://huggingface.co/allenai/truthfulqa-truth-judge-llama2-7B They should be able to serve the same function, and LLaMA2 7B seems to be larger and more powerful than GPT-3 curie (which is 6.5B and old). Hope it helps!

Apr 29 '25 17:04 voidism

@voidism Thank you for your suggestion! Have you ever tried Truth-Llama? I gave it a try and found that the Truth score was rather high.

In addition, I attempted to fine-tune other GPT models: GPT-3.5-turbo and GPT-4 Omni, and the results showed that the Truth scores were both on the high side.

Apr 30 '25 03:04 ker-02

您好，AllenAI 发布的 TruthfulQA 评估器您使用过吗？为什么我没办法使用

Dec 15 '25 07:12 wangzihang33