Junxi He

Results 2 comments of Junxi He

I have the same problem too,here's the full log: (tensorrt) onatter@Onatter:~/TensorRT-LLM/examples/chatglm$ trtllm-build --checkpoint_dir trt_ckpt/chatglm3_6b_32k/ --gemm_plugin float16 \ --output_dir trt_engines/chatglm3_6b/fp16/1-gpu [TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024041600 [04/17/2024-13:22:01] [TRT-LLM] [I] Set bert_attention_plugin to float16....

I guess it might just be that I don’t have enough CUDA memory.I worked with int8 weight only quantization.You can have a try.