Ashish Kumar comments

Repositories
Issues
Comments

Results 3 comments of


                                            Ashish Kumar

V100 GPU : Facing issue with model conversion from nemo format to trt format

Is there a way I could get the tensorrt_llm to work on my V100 GPU with llama model ?

V100 GPU : Facing issue with model conversion from nemo format to trt format

export_to_trt.py script that is there in the nemo-inference container from ngc has a check that dtype anything other than bfloat16 is not supported by tensorrt-llm and as you said bfloat16...

V100 GPU : Facing issue with model conversion from nemo format to trt format

nemo export_to_trt does not support this argument i.e. "--gpt_attention_plugin=disable". Any recommendation on how can i set the parameter "gpt_attention_plugin" nemo-inference?