0xd8b comments

Results 13 comments of


                                            0xd8b

"High GPU usage leads to NaN values in the encoder output of the T5 model (float16).

We converted the T5 model using the files in example/enc_dec/. The data type used for conversion is float16 (batch_size=1, strongly_typed=True, use_bert_plugin=True). Additionally, we truncated the output of hidden_states. However, during...

"High GPU usage leads to NaN values in the encoder output of the T5 model (float16).

@pommedeterresautee thans for your reply! Are you referring to the conversion of the model using the bfloat16 data type

"High GPU usage leads to NaN values in the encoder output of the T5 model (float16).

@pommedeterresautee Ok, thanks for the suggestion, I will give it a try. However, I'm still curious as to why the model (float16 type) works fine at low gpu usage.

"High GPU usage leads to NaN values in the encoder output of the T5 model (float16).

We attempted to convert the model to bfloat16 and conduct inference, yet the issue persists even under high GPU utilization. It seems that there's a problem occurring in the computation...

"High GPU usage leads to NaN values in the encoder output of the T5 model (float16).

The issue is also caused by the encoder_input_length problem described in https://github.com/NVIDIA/TensorRT-LLM/issues/1847. This issue can be closed.

Flan t5 xxl result large difference

We have encountered a similar issue where the T5-large model fails to align with the HF model. We used the test set provided by HF (question-answer pairs), and found that...

Flan t5 xxl result large difference

Thank you for your response! We are using the T5-Large model, but we have fine-tuned it, which makes sharing the model and test samples difficult. However, I will try my...

Flan t5 xxl result large difference

@sc-gr We're using the latest code.

Flan t5 xxl result large difference

@symphonylyh Thank you very much for your thorough analysis and resolution of the issue. We have modified the code and conducted tests. Currently, with the GPT plugin, the T5 model...

Flan t5 xxl result large difference

@symphonylyh I will submit a new issue. This is an interesting phenomenon: 1. Using float32 type: - GPU initial usage is 0%, model inference works correctly, and the inference results...