Aashraya comments

Repositories
Issues
Comments

Results 5 comments of


                                            Aashraya

Flan t5 xxl result large difference

Any update on this issue?

Flan t5 xxl result large difference

@symphonylyh Thank you for the detailed response. In my case, outputs of decoder are way off as compared to the HF model. I have tried optimising using TensorRT as well...

Flan t5 xxl result large difference

My model is flan t5 xl with tp 1. Yes, I am using bfloat16 and not fp16.

Flan t5 xxl result large difference

Thanks @symphonylyh There is another similar code fragment [link](https://github.com/NVIDIA/TensorRT-LLM/blob/71d8d4d3dc655671f32535d6d2b60cab87f36e87/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttentionTemplate.h#L2095C1-L2098C49) Do we need to change this as well?

Flan t5 xxl result large difference

gotcha... tested on some examples, seems to be working fine now. will update after exhaustive testing