Shubhashis Roy Dipta comments

Results 22 comments of


                                            Shubhashis Roy Dipta

Low IFEval in Qwen3

> What scores do you get for IFEval under the `hf` backend rather than `vllm`? Its not that better. ``` "results": { "ifeval": { "alias": "ifeval", "prompt_level_strict_acc,none": 0.266173752310536, "prompt_level_strict_acc_stderr,none": 0.019018766847290668,...

Low IFEval in Qwen3

> I see you are missing some import switches from the hf model command line: `--apply_chat_template` and `--fewshot_as_multiturn` (although I don't think the fewshot one has an affect on IFEval...

Low IFEval in Qwen3

> True, not as good, but we are getting a lot better now! The problem is I think we might need to use thinking to get the full performance back,...

Low IFEval in Qwen3

any update on this? I found that lm-eval doesn't strip the thinking token from the start, even if its empty

Low IFEval in Qwen3

> I tested by setting --model_args "pretrained=Qwen/Qwen3-8B,enable_thinking=False", the eval score is 0.8299, almost the same to the reported score. can you share your script or the exact command you used?...

Error: Python.h

> Can you try to use docker? @shuaills the school server doesn't have docker. so no ☹ But I tried the same thing with conda env and pip install, it...

Flexible Loader

> Ideally `save_to_disk` should save in a format compatible with load_dataset, wdyt ? That would be perfect if not at least a flexible loader.

Flexible Loader

@lhoestq For now, you can use this small utility library: [nanoml](https://pypi.org/project/nanoml/) ```python from nanoml.data import load_dataset_flexible ``` I actively develop and maintain this utility library. Open to contributors. Please open...

[Bugfix] Initialize attention bias on the same device as Query/Key/Value

using vllm==0.7.3, still having this issue I think its not released yet

Pretrained Checkpoints of CLIP-VIP

Did you find it? I am also looking for a pretrained checkpoint.