freq comments

Repositories
Issues
Comments

Results 3 comments of


                                            freq

How you evaluate reasoning models like QwQ-32B, since the response time and token length is very long?

I wonder whether you need to add an additional hyperpamameter "timeout" to the following place during evaluation: completion = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], temperature=temperature, max_tokens=max_new_tokens, timeout=40000 # or...

How you evaluate reasoning models like QwQ-32B, since the response time and token length is very long?

Thanks to your reply!

How you evaluate reasoning models like QwQ-32B, since the response time and token length is very long?

Have you evaluated QwQ-32B on Longbench v1? If so, are ther any adjustments to the hyperparameters in pred.py?