Yushi Bai comments

Results 102 comments of


                                            Yushi Bai

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) when running vllm

Hi, can you try updating your `vllm` version to 0.5.4?

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) when running vllm

你好，是不是你的模型没有下载成功？可以试试先把模型下载到本地然后用本地路径载入vllm。

Why are empty responses ignored in LongBench v2?

Hi! Empty responses only occur when an exception is raised during model calls, as seen here: https://github.com/THUDM/LongBench/blob/main/pred.py#L54. During evaluation, models always output some response, even when unsure, and never return...

Getting stuck

Hi, I've not tried GPT4ALL, but I guess what is causing this is a template mismatch. Which model and template are you using right now?

Error code: 404 - {'object': 'error', 'message': 'The model `/dev/shm/glm-4-9b-chat/` does not exist.', 'type': 'NotFoundError', 'param': None, 'code': 404}

请参考这个文档：https://docs.vllm.ai/en/latest/getting_started/quickstart.html 确保部署服务时的模型路径和调用时的模型名称一致。

如何使vllm下测评结果更接近hf？

你好，看起来你在hf和vllm的inference中均用了greedy decoding，请问你用hf和vllm中打多次输出一样吗？

Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 131072 tokens. However, you requested 351430 tokens. Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400

Hey, does this error happen during testing OpenAI model? You need to first truncate the sequence using `tiktoken` to less than 131072 tokens and then call the model on the...

Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 131072 tokens. However, you requested 351430 tokens. Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400

We take care of this potential issue in our code: https://github.com/THUDM/LongBench/blob/main/pred.py#L23

Are agentic systems in scope for your leaderboard?

Interesting! We would like to see how these agentic systems perform on the realistic tasks in LongBench v2. We welcome your submissions!

怎么测试qwen2.5系列模型

需要用YaRN，以下是qwen官方给的部署教程： The current config.json is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize [YaRN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation,...