Ashish Kumar
Results
3
comments of
Ashish Kumar
Is there a way I could get the tensorrt_llm to work on my V100 GPU with llama model ?
export_to_trt.py script that is there in the nemo-inference container from ngc has a check that dtype anything other than bfloat16 is not supported by tensorrt-llm and as you said bfloat16...
nemo export_to_trt does not support this argument i.e. "--gpt_attention_plugin=disable". Any recommendation on how can i set the parameter "gpt_attention_plugin" nemo-inference?