Zang Yuchen

Results 6 comments of Zang Yuchen

> Hi > Could you please provide the model, data and test script? hi grimoire, I made some investigation into this problem and found that there are some gpu to...

> Ok, I will let you know when I found something. I am not pro on NLP, this might take some time. > And... It is better to add `torch.cuda.sychronize()`...

> Ok, I will let you know when I found something. I am not pro on NLP, this might take some time. > And... It is better to add `torch.cuda.sychronize()`...

> Are you using this repo to accelerate decoder? > FP16 mode can bonus speed by 2.7 times, but the results are different. I want to dig into it. Where...

> The different between output of fp16 and fp32 might be Inevitable. Significand precision of fp16 is 10 bit and exponent is 5 bit. That would limit the precision of...

> And, by the way, have you tried to increase `max_workspace_size`? Some tactic need more workspace to perform accelerate. hello grimoire, I made some test, increase workspace_size gain some accelerate...