Zang Yuchen
Zang Yuchen
> Hi > Could you please provide the model, data and test script? hi grimoire, I made some investigation into this problem and found that there are some gpu to...
> Ok, I will let you know when I found something. I am not pro on NLP, this might take some time. > And... It is better to add `torch.cuda.sychronize()`...
> Ok, I will let you know when I found something. I am not pro on NLP, this might take some time. > And... It is better to add `torch.cuda.sychronize()`...
> Are you using this repo to accelerate decoder? > FP16 mode can bonus speed by 2.7 times, but the results are different. I want to dig into it. Where...
> The different between output of fp16 and fp32 might be Inevitable. Significand precision of fp16 is 10 bit and exponent is 5 bit. That would limit the precision of...
> And, by the way, have you tried to increase `max_workspace_size`? Some tactic need more workspace to perform accelerate. hello grimoire, I made some test, increase workspace_size gain some accelerate...