Junjie comments

Results 5 comments of


                                            Junjie

请问BAAI / bge-reranker-base 模型转为onnx怎么只有分类前的部分？

你好，请问这个问题是咋解决的。使用AutoModelForSequenceClassification嘛

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

> 把模型和数据都扔到cuda: 0 请问怎么样才能指定单个cuda啊，设置了os.environ["CUDA_VISIBLE_DEVICES"] = "0"，好像没啥用

Qwen2-72B-Instruct-GPTQ-Int4 Conversion Success, Run Failure

try to add "--use_custom_all_reduce disable" when do trtllm-build. It works for me. I don't know exactly why it works. I guess it changes the way how multi-gpu communicating

Qwen2-72B-Instruct-GPTQ-Int4 Conversion Success, Run Failure

additionally, "--use_custom_all_reduce" option is removed in latest tensorrt-llm. I don't know why they do that

Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.

> As suggested by [Jokeren](https://github.com/Jokeren), storing the temporary values to the global memory and then reload from it with latest triton version is working on V100. hi, I meet the...