Zang Yuchen comments

Results 6 comments of


                                            Zang Yuchen

converted bart model is slower than the original one during inference time

> Hi > Could you please provide the model, data and test script? hi grimoire, I made some investigation into this problem and found that there are some gpu to...

converted bart model is slower than the original one during inference time

> Ok, I will let you know when I found something. I am not pro on NLP, this might take some time. > And... It is better to add `torch.cuda.sychronize()`...

converted bart model is slower than the original one during inference time

> Ok, I will let you know when I found something. I am not pro on NLP, this might take some time. > And... It is better to add `torch.cuda.sychronize()`...

converted bart model is slower than the original one during inference time

> Are you using this repo to accelerate decoder? > FP16 mode can bonus speed by 2.7 times, but the results are different. I want to dig into it. Where...

converted bart model is slower than the original one during inference time

> The different between output of fp16 and fp32 might be Inevitable. Significand precision of fp16 is 10 bit and exponent is 5 bit. That would limit the precision of...

converted bart model is slower than the original one during inference time

> And, by the way, have you tried to increase `max_workspace_size`? Some tactic need more workspace to perform accelerate. hello grimoire, I made some test, increase workspace_size gain some accelerate...