quanfeifan
quanfeifan
Hi, have you solved it? I meet the same problem
waiting for demo of inference with qwen2-vl with turbomind backend
> waiting for demo of inference with qwen2-vl with turbomind backend what's more, any plan to support qwen2-vl quantized with awq, w4a16 with turbomind?
> > > Download `nv-tensorrt-local-repo-cross-aarch64-l4t-10.0.1-cuda-12.4_1.0-1_all.deb` at local, then > > > `sudo dpkg -i install nv-tensorrt-local-repo-cross-aarch64-l4t-10.0.1-cuda-12.4_1.0-1_all.deb` > > > > > > No, `cross-aarch64` is for cross compilation, TRT is...
> Thanks for your feedback. It would be better to support running on the NVIDIA orin but we do not have the device. If you are interested, we will provide...
maybe this pr will help #271
seem similar to me https://github.com/InternLM/lmdeploy/issues/3006