Junjie
Junjie
你好,请问这个问题是咋解决的。使用AutoModelForSequenceClassification嘛
> 把模型和数据都扔到cuda: 0 请问怎么样才能指定单个cuda啊,设置了os.environ["CUDA_VISIBLE_DEVICES"] = "0",好像没啥用
try to add "--use_custom_all_reduce disable" when do trtllm-build. It works for me. I don't know exactly why it works. I guess it changes the way how multi-gpu communicating
additionally, "--use_custom_all_reduce" option is removed in latest tensorrt-llm. I don't know why they do that
> As suggested by [Jokeren](https://github.com/Jokeren), storing the temporary values to the global memory and then reload from it with latest triton version is working on V100. hi, I meet the...