TensorRT-LLM Error Code 9: Internal Error (Builder was created on device different than current device. qwen-7B INT4-GPTQ on A100

Error Code 9: Internal Error (Builder was created on device different than current device. qwen-7B INT4-GPTQ on A100

Open wujingbo-web opened this issue 1 year ago • 4 comments

i try to run qwen-7B INT4-GPTQ example， error comes:Error Code 9: Internal Error (Builder was created on device different than current device. my gpu are two A100 80G,please help~~

! python3 /root/TensorRT-LLM/examples/qwen/build.py --hf_model_dir Qwen-7B-Chat-Int4
--quant_ckpt_path Qwen-7B-Chat-Int4
--dtype float16
--remove_input_padding
--use_gpt_attention_plugin float16
--enable_context_fmha
--use_gemm_plugin float16
--use_weight_only
--weight_only_precision int4_gptq
--per_group
--world_size 1
--tp_size 1
--output_dir ./tmp/Qwen/7B/trt_engines/int4-gptq/1-gpu

[01/29/2024-03:10:12] [TRT] [E] 9: [backendBuilderConfig.cpp::validateBasicAssumptions::28] Error Code 9: Internal Error (Builder was created on device different than current device.) [01/29/2024-03:10:12] [TRT-LLM] [E] Engine building failed, please check the error log. [01/29/2024-03:10:12] [TRT-LLM] [I] Config saved to ./tmp/Qwen/7B/trt_engines/int4-gptq/1-gpu/config.json. Traceback (most recent call last): File "/root/TensorRT-LLM/examples/qwen/build.py", line 710, in build(0, args) File "/root/TensorRT-LLM/examples/qwen/build.py", line 682, in build assert engine is not None, f'Failed to build engine for rank {cur_rank}' AssertionError: Failed to build engine for rank 0

Jan 29 '24 03:01 wujingbo-web

Please follow the issue template to file issue, thank you for cooperation.

Jan 30 '24 07:01 byshiue

faced the same issue

Feb 07 '24 07:02 zeeshanvision

@wujingbo-web @zeeshanvision Try specify CUDA device.

CUDA_VISIBLE_DEVICES=0 python build.py --...

Feb 17 '24 14:02 lajiyuan

Do you use MIG? Also, please follow the template to share the reproduced steps and the full error logs.

Mar 27 '24 07:03 byshiue

TensorRT-LLM TensorRT-LLM copied to clipboard

Error Code 9: Internal Error (Builder was created on device different than current device. qwen-7B INT4-GPTQ on A100

i try to run qwen-7B INT4-GPTQ example， error comes:Error Code 9: Internal Error (Builder was created on device different than current device. my gpu are two A100 80G,please help~~

TensorRT-LLM
TensorRT-LLM copied to clipboard