Shubhashis Roy Dipta
Shubhashis Roy Dipta
> What scores do you get for IFEval under the `hf` backend rather than `vllm`? Its not that better. ``` "results": { "ifeval": { "alias": "ifeval", "prompt_level_strict_acc,none": 0.266173752310536, "prompt_level_strict_acc_stderr,none": 0.019018766847290668,...
> I see you are missing some import switches from the hf model command line: `--apply_chat_template` and `--fewshot_as_multiturn` (although I don't think the fewshot one has an affect on IFEval...
> True, not as good, but we are getting a lot better now! The problem is I think we might need to use thinking to get the full performance back,...
any update on this? I found that lm-eval doesn't strip the thinking token from the start, even if its empty
> I tested by setting --model_args "pretrained=Qwen/Qwen3-8B,enable_thinking=False", the eval score is 0.8299, almost the same to the reported score. can you share your script or the exact command you used?...
> Can you try to use docker? @shuaills the school server doesn't have docker. so no ☹ But I tried the same thing with conda env and pip install, it...
> Ideally `save_to_disk` should save in a format compatible with load_dataset, wdyt ? That would be perfect if not at least a flexible loader.
@lhoestq For now, you can use this small utility library: [nanoml](https://pypi.org/project/nanoml/) ```python from nanoml.data import load_dataset_flexible ``` I actively develop and maintain this utility library. Open to contributors. Please open...
using vllm==0.7.3, still having this issue I think its not released yet
Did you find it? I am also looking for a pretrained checkpoint.