quanfeifan

Results 8 comments of quanfeifan

waiting for demo of inference with qwen2-vl with turbomind backend

> waiting for demo of inference with qwen2-vl with turbomind backend what's more, any plan to support qwen2-vl quantized with awq, w4a16 with turbomind?

> > > Download `nv-tensorrt-local-repo-cross-aarch64-l4t-10.0.1-cuda-12.4_1.0-1_all.deb` at local, then > > > `sudo dpkg -i install nv-tensorrt-local-repo-cross-aarch64-l4t-10.0.1-cuda-12.4_1.0-1_all.deb` > > > > > > No, `cross-aarch64` is for cross compilation, TRT is...

> Thanks for your feedback. It would be better to support running on the NVIDIA orin but we do not have the device. If you are interested, we will provide...

seem similar to me https://github.com/InternLM/lmdeploy/issues/3006