bhsueh_NV

Results 639 comments of bhsueh_NV

This PR has bug when we run multi-GPU BERT with multi-thread, so we fix this issue in latest release directly. Thank you for the feedback and PR.

Hi, kgimpel. Thanks for your fallback. This is really a bug. We will fix it in next release.

Hi, kgimpel. This bug is fixed in latest main branch.

Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.

I don't know what input/output you use, what computing cost you expect, and what you want to ask.

Thanks for your feedback, we will consider it.

> @byshiue will the FT op be in the roadmap for the next release? TF op turns out to be faster than th op from the decoder(decoding) benchmark and is...

Thank you for the comment and discussion. As you say, this is a bug, and we have fixed it in latest release.

Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.

I see you set data_type to fp32, which requires 12 GB to store the model. In such case, bs 32 + beam width 4 + sequence length 128 may be...