AmazDeng comments

Results 49 comments of


                                            AmazDeng

InternVL2-40B-AWQ+lmdeploy video infer speed is very slow

> I recommand using multi-thread to do the batching inference with pipeline.stream_infer api and each thread do one request, the engine will automatically batch llm part if the above conditions...

InternVL2-40B-AWQ+lmdeploy video infer speed is very slow

> In my opinion, using multithreading or threadpool are the same thing. The truth of the matter is that each thread processes one request > > It is worth noting...

InternVL2-40B-AWQ+lmdeploy video infer speed is very slow

> > 我想使用 InternVL2-40B-AWQ+lmdeploy 进行离线视频批量推理，类似于。根据您的对话上下文，我是否可以假设，如果不使用流式处理功能，使用多线程并行发送请求的速度接近本机批量推理的速度？ > > 我想问一下， InternVL2-40B-AWQ启动大概需要多大的显存？需要A100 80G版本，可以启动

the output of onnx model is different from model inferenced by TensorRT

@rajeevsrao @ttyio @pranavm-nvidia @aaronp24 @ilyasher Could you please take a look at this issue?

the output of onnx model is different from model inferenced by TensorRT

> The problem is that `trtexec` will use random scaling factors for `int8` mode. If you replace `--best` with `--fp16` (i.e. disable `--int8`), that should improve the accuracy. @pranavm-nvidia Thanks...

the output of onnx model is different from model inferenced by TensorRT

> same issue, you can set flash_attn to false and use bf16 to compile, it works for me @seanxcwang I followed the method you provided for testing. In the hf...

the output of onnx model is different from model inferenced by TensorRT

@seanxcwang I found that the following section of code in the Hugging Face model caused my TRT engine model export to be in the float32 format, which ensures that the...

flux lora train support

@flybird11111 Thanks for your answer. 1.So, what specific models does ColossalAI support? I haven't seen a list of supported models. 2.The Open-Sora project (https://github.com/hpcaitech/Open-Sora) uses the ColossalAI engine. I've noticed...

flux lora train support

@flybird11111 I noticed that Open-Sora and ColossalAI are from the same team. Does this mean that if I were to modify Flux myself into a training paradigm of LoRA +...