nv-guomingz

Results 27 comments of nv-guomingz

Our latest main branch doesn't contain build.py under examples/llama path. Are u using a legacy version code base? Please refer to[ new workflow doc](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/new_workflow.md) for details with our latest code.

Please try main branch if possible since our coming release also will use new build workflow.

Could u please execute ls cmd under your ./Llama-2-7b/ path?

> g that version, but now when I try to use multiple GPUs, > I encounter this specific issue with the same m May I know the full command that...

Just wanna to double confirm that u're using 2 x NVIDIA Tesla V10016GB vRAM for running llama with tp_size 8?

As the error msg said: Assertion failed: Unsupported data type, pre SM 80 GPUs do not support bfloat16 So it's the expected behavior rather than bug.

Another war is disable gpt attention plugin via --gpt_attention_plugin=disable

this opt is applied for building engine via trtllm-build interface, if this opt was not supported by nemo script and we don't have any recommendation suggestion for enabling model conversion...

It seems there's bug here (I assume you're using main branch). A quick war is to comment line 1035 where `del groupwise_qweight_safetensors`, the rootcause is you're using pt format file...