torin
torin
[sample.zip](https://github.com/open-mmlab/labelbee-client/files/8668890/sample.zip) if you want to reproduce the bug, just repeat sample file for multiple time.
the methodology works like shit, the negtive points dont work this way:)
> noooop when diving into the code, i found that internvl2 uses xformer's attention not the naive one, so, the slow speed maybe come from other part.
> > when diving into the code, i found that internvl2 uses xformer's attention not the naive one, so, the slow speed maybe come from other part. > > @torinchen...
[test.zip](https://github.com/user-attachments/files/17552824/test.zip) i test the online and offline mode, the gap is significant (under lmdeploy, the gap is zero btw ) mail: [email protected]
yes, my model is just a sft version InternVL2-8B
> @torinchen @luohao123 > > I can't reproduce this issues > > code https://github.com/noooop/light-vllm/tree/main/benchmarks/InternVL2 > > image preprocessing time is not included > > transformers 4.37.2 + flash_attn 2.6.3 >...
> How many adapters do you need? Turbomind will only support lora without the "s-" in the future. ok~ , typical more than 2 adapters in deployment, s-lora can save...
internvl2_5 and QwenVL serials
same problem, just update your accelerate==0.33.0 👍