LongBench icon indicating copy to clipboard operation
LongBench copied to clipboard

Code for evaluation with GPT-3.5?

Open RuskinManku opened this issue 1 year ago • 3 comments

The results mention the scores of GPT-3.5 but I don't see how I can evaluate GPT using the code as it doesn't have that model.

RuskinManku avatar Jul 22 '24 20:07 RuskinManku

The GPT-3.5-Turbo-16k model evaluated in our paper has already been deprecated. You can try gpt-3.5-turbo-0125 (16k), or the most recent gpt-4o-mini (128k), according to OpenAI (https://platform.openai.com/docs/models).

bys0318 avatar Jul 23 '24 08:07 bys0318

Thanks for responding. Yes I can evaluate those, but I didn't find code where I can just change the open ai model and evaluate different ones.

RuskinManku avatar Jul 23 '24 19:07 RuskinManku

Right. We didn't provide code for evaluating API models. You can modify the get_pred() fucntion to do so.

bys0318 avatar Jul 24 '24 07:07 bys0318