0xd8b

Results 13 comments of 0xd8b

We converted the T5 model using the files in example/enc_dec/. The data type used for conversion is float16 (batch_size=1, strongly_typed=True, use_bert_plugin=True). Additionally, we truncated the output of hidden_states. However, during...

@pommedeterresautee thans for your reply! Are you referring to the conversion of the model using the bfloat16 data type

@pommedeterresautee Ok, thanks for the suggestion, I will give it a try. However, I'm still curious as to why the model (float16 type) works fine at low gpu usage.

We attempted to convert the model to bfloat16 and conduct inference, yet the issue persists even under high GPU utilization. It seems that there's a problem occurring in the computation...

The issue is also caused by the encoder_input_length problem described in https://github.com/NVIDIA/TensorRT-LLM/issues/1847. This issue can be closed.

We have encountered a similar issue where the T5-large model fails to align with the HF model. We used the test set provided by HF (question-answer pairs), and found that...

Thank you for your response! We are using the T5-Large model, but we have fine-tuned it, which makes sharing the model and test samples difficult. However, I will try my...

@sc-gr We're using the latest code.

@symphonylyh Thank you very much for your thorough analysis and resolution of the issue. We have modified the code and conducted tests. Currently, with the GPT plugin, the T5 model...

@symphonylyh I will submit a new issue. This is an interesting phenomenon: 1. Using float32 type: - GPU initial usage is 0%, model inference works correctly, and the inference results...