Bytes-Lin

Results 4 comments of Bytes-Lin

> 我感觉是因为其实大部分的大模型做grounding的都是0-1000, Gemini, Intern-VL,Florence-2.. 所以改回来了😂之前得Qwen 2-V也是0-1000. 我个人浅显认为数据saciling law 堆积模型的感知能力,比坐标转化可能更加关键(而且2.5VL中好像也不是真的绝对像素坐标),在LLM范式下做检测,如果定义得输出坐标格式和别的模型都不一样,那么数据清晰成本可能也是大很多的 (浅显拙见.jpg) 你好,我想问一下现在InternVL-3和Llava-onevision的grounding输出是怎么量化的,也是0-1000吗,或者归一化?

> Can you perhaps supply a small reproducer? To clarify, the `qkv` module wouldn't receive gradients as it is not trained, the adapter is trained. Can you confirm that the...

> Thanks for providing the reproducer. When I tested it locally, the issue stemmed from gradient checkpointing. When setting `gradient_checkpointing=False`, there were gradients. Could you please give this a try?...

> This is indeed a problem with transformers, thanks [@BenjaminBossan](https://github.com/BenjaminBossan) for narrowing it down. > > The problem is with `model.enable_input_require_grads()` - it doesn't seem to support visual language models...