TensorRT-LLM The issue with Llama when trying to create the: checkpoint: does not appear config.json

System Info

2 x NVIDIA Tesla V10016GB vRAM

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

I downloaded the model called Llama to test Hugging Face. I'm using the Llama2-7b model. Then I tried to execute the command: python3 convert_checkpoint.py --model_dir ./tmp/llama/7B/
--output_dir ./tllm_checkpoint_2gpu_tp2
--dtype float16
--tp_size 2

Expected behavior

that the command finishes without errors and creates the folder tllm_checkpoint_2gpu_tp2 correctly

actual behavior

root@b45ee7ad85b4:/TensorRT-LLM/examples/llama# python3 convert_checkpoint.py --model_dir ./Llama-2-7b/
--output_dir ./tllm_checkpoint_2gpu_tp2
--dtype float16
--tp_size 2
[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev20240206000.9.0.dev2024020600 Traceback (most recent call last): File "/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 1547, in main() File "/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 1206, in main hf_config = LlamaConfig.from_pretrained(args.model_dir) File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 615, in from_pretrained config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 644, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 699, in _get_config_dict resolved_config_file = cached_file( File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 360, in cached_file raise EnvironmentError( OSError: ./Llama-2-7b/ does not appear to have a file named config.json. Checkout 'https://huggingface.co/./Llama-2-7b//main' for available files.

additional notes

my files in the cotainer root@b45ee7ad85b4:/TensorRT-LLM/examples/llama# ls Llama-2-7b README.md convert_checkpoint.py requirements.txt summarize_long.py tllm_checkpoint_2gpu_tp2 root@b45ee7ad85b4:/TensorRT-LLM/examples/llama#

Feb 07 '24 19:02 jamil-z

If it is a Meta checkpoint, you may need to try --meta_ckpt_dir instead of --model_dir?

Feb 07 '24 19:02 nvluxiaoz

thanks @nvluxiaoz At the end I downloaded version llama hf version from Hugging Face, which solved the initial problem with the config.json file.

I'm not sure if it's because I'm using that version, but now when I try to use multiple GPUs, I encounter this specific issue with the same model, llama2-7b-hf.

[TensorRT-LLM][INFO] MPI size: 1, rank: 0 Traceback (most recent call last): File "/TensorRT-LLM/examples/llama/../run.py", line 538, in <module> main(args) File "/TensorRT-LLM/examples/llama/../run.py", line 408, in main runner = runner_cls.from_dir(**runner_kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 142, in from_dir world_config = WorldConfig.mpi(tensor_parallelism=tp_size, RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: mpiSize == tp * pp (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/tensorrt_llm/cpp/tensorrt_llm/runtime/worldConfig.cpp:94)

Feb 07 '24 21:02 jamil-z

Could u please execute ls cmd under your ./Llama-2-7b/ path?

Feb 08 '24 07:02 nv-guomingz

root@b45ee7ad85b4:/TensorRT-LLM/examples/llama# ls Llama-2-7b-hf convert_checkpoint.py summarize_long.py tllm_checkpoint_2gpu_tp2 README.md requirements.txt tmp root@b45ee7ad85b4:/TensorRT-LLM/examples/llama# cd Llama-2-7b-hf/ root@b45ee7ad85b4:/TensorRT-LLM/examples/llama/Llama-2-7b-hf# ls LICENSE.txt config.json model.safetensors.index.json special_tokens_map.json README.md generation_config.json pytorch_model-00001-of-00002.bin tokenizer.json Responsible-Use-Guide.pdf model-00001-of-00002.safetensors pytorch_model-00002-of-00002.bin tokenizer.model USE_POLICY.md model-00002-of-00002.safetensors pytorch_model.bin.index.json tokenizer_config.json root@b45ee7ad85b4:/TensorRT-LLM/examples/llama/Llama-2-7b-hf#

Feb 08 '24 13:02 jamil-z

g that version, but now when I try to use multiple GPUs, I encounter this specific issue with the same m

May I know the full command that trigger such issue?

Feb 08 '24 14:02 nv-guomingz

now I have this issue

`root@c6fc756c94d5:/TensorRT-LLM/examples/llama# mpirun -n 8 --allow-run-as-root python3 ../run.py --max_output_len=50 --engine_dir ./tmp/llama/7B/trt_engines/fp16/8-gpu/ --input_text "To tell a story" [TensorRT-LLM][INFO] Engine version 0.9.0.dev2024020600 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] Parameter max_draft_len cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][INFO] MPI size: 8, rank: 6 [TensorRT-LLM][INFO] Engine version 0.9.0.dev2024020600 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] Parameter max_draft_len cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][INFO] MPI size: 8, rank: 3 [TensorRT-LLM][INFO] Engine version 0.9.0.dev2024020600 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] Parameter max_draft_len cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][INFO] MPI size: 8, rank: 5 [TensorRT-LLM][INFO] Engine version 0.9.0.dev2024020600 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] Parameter max_draft_len cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][INFO] MPI size: 8, rank: 4 [TensorRT-LLM][INFO] Engine version 0.9.0.dev2024020600 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] Parameter max_draft_len cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][INFO] MPI size: 8, rank: 2 [TensorRT-LLM][INFO] Engine version 0.9.0.dev2024020600 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] Parameter max_draft_len cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][INFO] MPI size: 8, rank: 7 [TensorRT-LLM][INFO] Engine version 0.9.0.dev2024020600 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] Parameter max_draft_len cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][INFO] MPI size: 8, rank: 1 [TensorRT-LLM][INFO] Engine version 0.9.0.dev2024020600 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] Parameter max_draft_len cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][INFO] MPI size: 8, rank: 0 [TensorRT-LLM][WARNING] Device 6 peer access Device 1 is not available. [TensorRT-LLM][WARNING] Device 6 peer access Device 2 is not available. [TensorRT-LLM][WARNING] Device 6 peer access Device 3 is not available. [TensorRT-LLM][INFO] Loaded engine size: 1828 MiB [TensorRT-LLM][WARNING] Device 2 peer access Device 5 is not available. [TensorRT-LLM][WARNING] Device 2 peer access Device 6 is not available. [TensorRT-LLM][WARNING] Device 2 peer access Device 7 is not available. [TensorRT-LLM][WARNING] Device 4 peer access Device 0 is not available. [TensorRT-LLM][WARNING] Device 4 peer access Device 1 is not available. [TensorRT-LLM][WARNING] Device 4 peer access Device 3 is not available. [TensorRT-LLM][WARNING] Device 3 peer access Device 4 is not available. [TensorRT-LLM][WARNING] Device 3 peer access Device 6 is not available. [TensorRT-LLM][WARNING] Device 3 peer access Device 7 is not available. [TensorRT-LLM][WARNING] Device 0 peer access Device 4 is not available. [TensorRT-LLM][WARNING] Device 0 peer access Device 5 is not available. [TensorRT-LLM][WARNING] Device 5 peer access Device 0 is not available. [TensorRT-LLM][WARNING] Device 5 peer access Device 1 is not available. [TensorRT-LLM][WARNING] Device 5 peer access Device 2 is not available. [TensorRT-LLM][WARNING] Device 0 peer access Device 7 is not available. [TensorRT-LLM][INFO] Loaded engine size: 1828 MiB [TensorRT-LLM][INFO] Loaded engine size: 1828 MiB [TensorRT-LLM][INFO] Loaded engine size: 1828 MiB [TensorRT-LLM][INFO] Loaded engine size: 1828 MiB [TensorRT-LLM][WARNING] Device 7 peer access Device 0 is not available. [TensorRT-LLM][WARNING] Device 7 peer access Device 2 is not available. [TensorRT-LLM][WARNING] Device 7 peer access Device 3 is not available. [TensorRT-LLM][INFO] Loaded engine size: 1828 MiB [TensorRT-LLM][WARNING] Device 1 peer access Device 4 is not available. [TensorRT-LLM][WARNING] Device 1 peer access Device 5 is not available. [TensorRT-LLM][WARNING] Device 1 peer access Device 6 is not available. [TensorRT-LLM][INFO] Loaded engine size: 1828 MiB [TensorRT-LLM][INFO] Loaded engine size: 1828 MiB [TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1979, GPU 2157 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 1981, GPU 2167 (MiB) [TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2 [TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1979, GPU 2157 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1979, GPU 2157 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 1981, GPU 2167 (MiB) [TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2 [TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1979, GPU 2157 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1979, GPU 2157 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 1981, GPU 2167 (MiB) [TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2 [TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1979, GPU 2157 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 1981, GPU 2167 (MiB) [TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2 [TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1979, GPU 2157 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1979, GPU 2157 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 1981, GPU 2167 (MiB) [TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2 [TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 1981, GPU 2167 (MiB) [TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2 [TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 1981, GPU 2167 (MiB) [TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2 [TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 1981, GPU 2167 (MiB) [TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2 NCCL version 2.18.1+cuda12.0 [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1825, now: CPU 0, GPU 1825 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1825, now: CPU 0, GPU 1825 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1825, now: CPU 0, GPU 1825 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1825, now: CPU 0, GPU 1825 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1825, now: CPU 0, GPU 1825 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1825, now: CPU 0, GPU 1825 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1825, now: CPU 0, GPU 1825 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1825, now: CPU 0, GPU 1825 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2219, GPU 2731 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2219, GPU 2763 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2219, GPU 2691 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2219, GPU 2807 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2219, GPU 2803 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2219, GPU 2711 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2219, GPU 2827 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2219, GPU 2755 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2219, GPU 2739 (MiB) [TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2 [TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2219, GPU 2771 (MiB) [TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2 [TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2219, GPU 2699 (MiB) [TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2 [TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2219, GPU 2815 (MiB) [TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2 [TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2219, GPU 2811 (MiB) [TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2 [TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2219, GPU 2719 (MiB) [TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2 [TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2219, GPU 2835 (MiB) [TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2 [TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2219, GPU 2763 (MiB) [TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2 [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 1825 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 1825 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 1825 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 1825 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 1825 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 1825 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 1825 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 1825 (MiB) [TensorRT-LLM][INFO] Allocate 27992784896 bytes for k/v cache. [TensorRT-LLM][INFO] Using 427136 tokens in paged KV cache. [TensorRT-LLM][INFO] Allocate 27992784896 bytes for k/v cache. [TensorRT-LLM][INFO] Using 427136 tokens in paged KV cache. [TensorRT-LLM][INFO] Allocate 27992784896 bytes for k/v cache. [TensorRT-LLM][INFO] Using 427136 tokens in paged KV cache. [TensorRT-LLM][INFO] Allocate 27992784896 bytes for k/v cache. [TensorRT-LLM][INFO] Using 427136 tokens in paged KV cache. [TensorRT-LLM][INFO] Allocate 27992784896 bytes for k/v cache. [TensorRT-LLM][INFO] Using 427136 tokens in paged KV cache. [TensorRT-LLM][INFO] Allocate 27992784896 bytes for k/v cache. [TensorRT-LLM][INFO] Using 427136 tokens in paged KV cache. [TensorRT-LLM][INFO] Allocate 27992784896 bytes for k/v cache. [TensorRT-LLM][INFO] Using 427136 tokens in paged KV cache. [TensorRT-LLM][INFO] Allocate 27992784896 bytes for k/v cache. [TensorRT-LLM][INFO] Using 427136 tokens in paged KV cache. [TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024020600Traceback (most recent call last): File "/TensorRT-LLM/examples/llama/../run.py", line 504, in main(args) File "/TensorRT-LLM/examples/llama/../run.py", line 379, in main runner = runner_cls.from_dir(*runner_kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 169, in from_dir session = GptSession(config=session_config, RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in error: peer access is not supported between these two devices (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/tensorrt_llm/cpp/tensorrt_llm/runtime/ipcUtils.cpp:48) 1 0x7f1d528dc0f7 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x8070f7) [0x7f1d528dc0f7] 2 0x7f1d545facf8 tensorrt_llm::runtime::setPeerAccess(tensorrt_llm::runtime::WorldConfig, bool) + 216 3 0x7f1d545e50da tensorrt_llm::runtime::GptSession::createCustomAllReduceWorkspace(int, int, int) + 202 4 0x7f1d545e5e4d tensorrt_llm::runtime::GptSession::setup(tensorrt_llm::runtime::GptSession::Config const&) + 1117 5 0x7f1d545e6291 tensorrt_llm::runtime::GptSession::GptSession(tensorrt_llm::runtime::GptSession::Config const&, tensorrt_llm::runtime::GptModelConfig const&, tensorrt_llm::runtime::WorldConfig const&, void const, unsigned long, std::shared_ptrnvinfer1::ILogger) + 977 6 0x7f1ea51c4c94 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x6ac94) [0x7f1ea51c4c94] 7 0x7f1ea519d6c9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x436c9) [0x7f1ea519d6c9] 8 0x7f1ea5187210 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2d210) [0x7f1ea5187210] 9 0x55d2b752610e python3(+0x15a10e) [0x55d2b752610e] 10 0x55d2b751ca7b _PyObject_MakeTpCall + 603 11 0x55d2b7534acb python3(+0x168acb) [0x55d2b7534acb] 12 0x55d2b7535635 _PyObject_Call + 277 13 0x55d2b7531087 python3(+0x165087) [0x55d2b7531087] 14 0x55d2b751ce2b python3(+0x150e2b) [0x55d2b751ce2b] 15 0x7f1ea51868c9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2c8c9) [0x7f1ea51868c9] 16 0x55d2b751ca7b _PyObject_MakeTpCall + 603 17 0x55d2b7516150 _PyEval_EvalFrameDefault + 30112 18 0x55d2b75347f1 python3(+0x1687f1) [0x55d2b75347f1] 19 0x55d2b7535492 PyObject_Call + 290 20 0x55d2b75115d7 _PyEval_EvalFrameDefault + 10791 21 0x55d2b75269fc _PyFunction_Vectorcall + 124 22 0x55d2b750f26d _PyEval_EvalFrameDefault + 1725 23 0x55d2b750b9c6 python3(+0x13f9c6) [0x55d2b750b9c6] 24 0x55d2b7601256 PyEval_EvalCode + 134 25 0x55d2b762c108 python3(+0x260108) [0x55d2b762c108] 26 0x55d2b76259cb python3(+0x2599cb) [0x55d2b76259cb] 27 0x55d2b762be55 python3(+0x25fe55) [0x55d2b762be55] 28 0x55d2b762b338 _PyRun_SimpleFileObject + 424 29 0x55d2b762af83 _PyRun_AnyFileObject + 67 30 0x55d2b761da5e Py_RunMain + 702 31 0x55d2b75f402d Py_BytesMain + 45 32 0x7f1f5a00ad90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f1f5a00ad90] 33 0x7f1f5a00ae40 __libc_start_main + 128 34 0x55d2b75f3f25 _start + 37 [TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024020600Traceback (most recent call last): File "/TensorRT-LLM/examples/llama/../run.py", line 504, in main(args) File "/TensorRT-LLM/examples/llama/../run.py", line 379, in main runner = runner_cls.from_dir(*runner_kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 169, in from_dir session = GptSession(config=session_config, RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in error: peer access is not supported between these two devices (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/tensorrt_llm/cpp/tensorrt_llm/runtime/ipcUtils.cpp:48) 1 0x7f5f1aedc0f7 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x8070f7) [0x7f5f1aedc0f7] 2 0x7f5f1cbfacf8 tensorrt_llm::runtime::setPeerAccess(tensorrt_llm::runtime::WorldConfig, bool) + 216 3 0x7f5f1cbe50da tensorrt_llm::runtime::GptSession::createCustomAllReduceWorkspace(int, int, int) + 202 4 0x7f5f1cbe5e4d tensorrt_llm::runtime::GptSession::setup(tensorrt_llm::runtime::GptSession::Config const&) + 1117 5 0x7f5f1cbe6291 tensorrt_llm::runtime::GptSession::GptSession(tensorrt_llm::runtime::GptSession::Config const&, tensorrt_llm::runtime::GptModelConfig const&, tensorrt_llm::runtime::WorldConfig const&, void const, unsigned long, std::shared_ptrnvinfer1::ILogger) + 977 6 0x7f606d7c4c94 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x6ac94) [0x7f606d7c4c94] 7 0x7f606d79d6c9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x436c9) [0x7f606d79d6c9] 8 0x7f606d787210 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2d210) [0x7f606d787210] 9 0x55e271ee110e python3(+0x15a10e) [0x55e271ee110e] 10 0x55e271ed7a7b _PyObject_MakeTpCall + 603 11 0x55e271eefacb python3(+0x168acb) [0x55e271eefacb] 12 0x55e271ef0635 _PyObject_Call + 277 13 0x55e271eec087 python3(+0x165087) [0x55e271eec087] 14 0x55e271ed7e2b python3(+0x150e2b) [0x55e271ed7e2b] 15 0x7f606d7868c9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2c8c9) [0x7f606d7868c9] 16 0x55e271ed7a7b _PyObject_MakeTpCall + 603 17 0x55e271ed1150 _PyEval_EvalFrameDefault + 30112 18 0x55e271eef7f1 python3(+0x1687f1) [0x55e271eef7f1] 19 0x55e271ef0492 PyObject_Call + 290 20 0x55e271ecc5d7 _PyEval_EvalFrameDefault + 10791 21 0x55e271ee19fc _PyFunction_Vectorcall + 124 22 0x55e271eca26d _PyEval_EvalFrameDefault + 1725 23 0x55e271ec69c6 python3(+0x13f9c6) [0x55e271ec69c6] 24 0x55e271fbc256 PyEval_EvalCode + 134 25 0x55e271fe7108 python3(+0x260108) [0x55e271fe7108] 26 0x55e271fe09cb python3(+0x2599cb) [0x55e271fe09cb] 27 0x55e271fe6e55 python3(+0x25fe55) [0x55e271fe6e55] 28 0x55e271fe6338 _PyRun_SimpleFileObject + 424 29 0x55e271fe5f83 _PyRun_AnyFileObject + 67 30 0x55e271fd8a5e Py_RunMain + 702 31 0x55e271faf02d Py_BytesMain + 45 32 0x7f61225e5d90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f61225e5d90] 33 0x7f61225e5e40 __libc_start_main + 128 34 0x55e271faef25 _start + 37 [TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024020600Traceback (most recent call last): File "/TensorRT-LLM/examples/llama/../run.py", line 504, in main(args) File "/TensorRT-LLM/examples/llama/../run.py", line 379, in main runner = runner_cls.from_dir(*runner_kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 169, in from_dir session = GptSession(config=session_config, RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in error: peer access is not supported between these two devices (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/tensorrt_llm/cpp/tensorrt_llm/runtime/ipcUtils.cpp:48) 1 0x7fe208cdc0f7 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x8070f7) [0x7fe208cdc0f7] 2 0x7fe20a9facf8 tensorrt_llm::runtime::setPeerAccess(tensorrt_llm::runtime::WorldConfig, bool) + 216 3 0x7fe20a9e50da tensorrt_llm::runtime::GptSession::createCustomAllReduceWorkspace(int, int, int) + 202 4 0x7fe20a9e5e4d tensorrt_llm::runtime::GptSession::setup(tensorrt_llm::runtime::GptSession::Config const&) + 1117 5 0x7fe20a9e6291 tensorrt_llm::runtime::GptSession::GptSession(tensorrt_llm::runtime::GptSession::Config const&, tensorrt_llm::runtime::GptModelConfig const&, tensorrt_llm::runtime::WorldConfig const&, void const, unsigned long, std::shared_ptrnvinfer1::ILogger) + 977 6 0x7fe35b5c4c94 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x6ac94) [0x7fe35b5c4c94] 7 0x7fe35b59d6c9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x436c9) [0x7fe35b59d6c9] 8 0x7fe35b587210 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2d210) [0x7fe35b587210] 9 0x561a96ce110e python3(+0x15a10e) [0x561a96ce110e] 10 0x561a96cd7a7b _PyObject_MakeTpCall + 603 11 0x561a96cefacb python3(+0x168acb) [0x561a96cefacb] 12 0x561a96cf0635 _PyObject_Call + 277 13 0x561a96cec087 python3(+0x165087) [0x561a96cec087] 14 0x561a96cd7e2b python3(+0x150e2b) [0x561a96cd7e2b] 15 0x7fe35b5868c9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2c8c9) [0x7fe35b5868c9] 16 0x561a96cd7a7b _PyObject_MakeTpCall + 603 17 0x561a96cd1150 _PyEval_EvalFrameDefault + 30112 18 0x561a96cef7f1 python3(+0x1687f1) [0x561a96cef7f1] 19 0x561a96cf0492 PyObject_Call + 290 20 0x561a96ccc5d7 _PyEval_EvalFrameDefault + 10791 21 0x561a96ce19fc _PyFunction_Vectorcall + 124 22 0x561a96cca26d _PyEval_EvalFrameDefault + 1725 23 0x561a96cc69c6 python3(+0x13f9c6) [0x561a96cc69c6] 24 0x561a96dbc256 PyEval_EvalCode + 134 25 0x561a96de7108 python3(+0x260108) [0x561a96de7108] 26 0x561a96de09cb python3(+0x2599cb) [0x561a96de09cb] 27 0x561a96de6e55 python3(+0x25fe55) [0x561a96de6e55] 28 0x561a96de6338 _PyRun_SimpleFileObject + 424 29 0x561a96de5f83 _PyRun_AnyFileObject + 67 30 0x561a96dd8a5e Py_RunMain + 702 31 0x561a96daf02d Py_BytesMain + 45 32 0x7fe41041cd90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fe41041cd90] 33 0x7fe41041ce40 __libc_start_main + 128 34 0x561a96daef25 _start + 37 [TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024020600Traceback (most recent call last): File "/TensorRT-LLM/examples/llama/../run.py", line 504, in main(args) File "/TensorRT-LLM/examples/llama/../run.py", line 379, in main runner = runner_cls.from_dir(*runner_kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 169, in from_dir session = GptSession(config=session_config, RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in error: peer access is not supported between these two devices (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/tensorrt_llm/cpp/tensorrt_llm/runtime/ipcUtils.cpp:48) 1 0x7f78214dc0f7 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x8070f7) [0x7f78214dc0f7] 2 0x7f78231facf8 tensorrt_llm::runtime::setPeerAccess(tensorrt_llm::runtime::WorldConfig, bool) + 216 3 0x7f78231e50da tensorrt_llm::runtime::GptSession::createCustomAllReduceWorkspace(int, int, int) + 202 4 0x7f78231e5e4d tensorrt_llm::runtime::GptSession::setup(tensorrt_llm::runtime::GptSession::Config const&) + 1117 5 0x7f78231e6291 tensorrt_llm::runtime::GptSession::GptSession(tensorrt_llm::runtime::GptSession::Config const&, tensorrt_llm::runtime::GptModelConfig const&, tensorrt_llm::runtime::WorldConfig const&, void const, unsigned long, std::shared_ptrnvinfer1::ILogger) + 977 6 0x7f7973dc4c94 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x6ac94) [0x7f7973dc4c94] 7 0x7f7973d9d6c9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x436c9) [0x7f7973d9d6c9] 8 0x7f7973d87210 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2d210) [0x7f7973d87210] 9 0x55c859c7110e python3(+0x15a10e) [0x55c859c7110e] 10 0x55c859c67a7b _PyObject_MakeTpCall + 603 11 0x55c859c7facb python3(+0x168acb) [0x55c859c7facb] 12 0x55c859c80635 _PyObject_Call + 277 13 0x55c859c7c087 python3(+0x165087) [0x55c859c7c087] 14 0x55c859c67e2b python3(+0x150e2b) [0x55c859c67e2b] 15 0x7f7973d868c9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2c8c9) [0x7f7973d868c9] 16 0x55c859c67a7b _PyObject_MakeTpCall + 603 17 0x55c859c61150 _PyEval_EvalFrameDefault + 30112 18 0x55c859c7f7f1 python3(+0x1687f1) [0x55c859c7f7f1] 19 0x55c859c80492 PyObject_Call + 290 20 0x55c859c5c5d7 _PyEval_EvalFrameDefault + 10791 21 0x55c859c719fc _PyFunction_Vectorcall + 124 22 0x55c859c5a26d _PyEval_EvalFrameDefault + 1725 23 0x55c859c569c6 python3(+0x13f9c6) [0x55c859c569c6] 24 0x55c859d4c256 PyEval_EvalCode + 134 25 0x55c859d77108 python3(+0x260108) [0x55c859d77108] 26 0x55c859d709cb python3(+0x2599cb) [0x55c859d709cb] 27 0x55c859d76e55 python3(+0x25fe55) [0x55c859d76e55] 28 0x55c859d76338 _PyRun_SimpleFileObject + 424 29 0x55c859d75f83 _PyRun_AnyFileObject + 67 30 0x55c859d68a5e Py_RunMain + 702 31 0x55c859d3f02d Py_BytesMain + 45 32 0x7f7a28b98d90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f7a28b98d90] 33 0x7f7a28b98e40 __libc_start_main + 128 34 0x55c859d3ef25 _start + 37 [TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024020600Traceback (most recent call last): File "/TensorRT-LLM/examples/llama/../run.py", line 504, in main(args) File "/TensorRT-LLM/examples/llama/../run.py", line 379, in main runner = runner_cls.from_dir(*runner_kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 169, in from_dir session = GptSession(config=session_config, RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in error: peer access is not supported between these two devices (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/tensorrt_llm/cpp/tensorrt_llm/runtime/ipcUtils.cpp:48) 1 0x7fc0fd4dc0f7 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x8070f7) [0x7fc0fd4dc0f7] 2 0x7fc0ff1facf8 tensorrt_llm::runtime::setPeerAccess(tensorrt_llm::runtime::WorldConfig, bool) + 216 3 0x7fc0ff1e50da tensorrt_llm::runtime::GptSession::createCustomAllReduceWorkspace(int, int, int) + 202 4 0x7fc0ff1e5e4d tensorrt_llm::runtime::GptSession::setup(tensorrt_llm::runtime::GptSession::Config const&) + 1117 5 0x7fc0ff1e6291 tensorrt_llm::runtime::GptSession::GptSession(tensorrt_llm::runtime::GptSession::Config const&, tensorrt_llm::runtime::GptModelConfig const&, tensorrt_llm::runtime::WorldConfig const&, void const, unsigned long, std::shared_ptrnvinfer1::ILogger) + 977 6 0x7fc24fdc4c94 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x6ac94) [0x7fc24fdc4c94] 7 0x7fc24fd9d6c9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x436c9) [0x7fc24fd9d6c9] 8 0x7fc24fd87210 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2d210) [0x7fc24fd87210] 9 0x55eea150010e python3(+0x15a10e) [0x55eea150010e] 10 0x55eea14f6a7b _PyObject_MakeTpCall + 603 11 0x55eea150eacb python3(+0x168acb) [0x55eea150eacb] 12 0x55eea150f635 _PyObject_Call + 277 13 0x55eea150b087 python3(+0x165087) [0x55eea150b087] 14 0x55eea14f6e2b python3(+0x150e2b) [0x55eea14f6e2b] 15 0x7fc24fd868c9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2c8c9) [0x7fc24fd868c9] 16 0x55eea14f6a7b _PyObject_MakeTpCall + 603 17 0x55eea14f0150 _PyEval_EvalFrameDefault + 30112 18 0x55eea150e7f1 python3(+0x1687f1) [0x55eea150e7f1] 19 0x55eea150f492 PyObject_Call + 290 20 0x55eea14eb5d7 _PyEval_EvalFrameDefault + 10791 21 0x55eea15009fc _PyFunction_Vectorcall + 124 22 0x55eea14e926d _PyEval_EvalFrameDefault + 1725 23 0x55eea14e59c6 python3(+0x13f9c6) [0x55eea14e59c6] 24 0x55eea15db256 PyEval_EvalCode + 134 25 0x55eea1606108 python3(+0x260108) [0x55eea1606108] 26 0x55eea15ff9cb python3(+0x2599cb) [0x55eea15ff9cb] 27 0x55eea1605e55 python3(+0x25fe55) [0x55eea1605e55] 28 0x55eea1605338 _PyRun_SimpleFileObject + 424 29 0x55eea1604f83 _PyRun_AnyFileObject + 67 30 0x55eea15f7a5e Py_RunMain + 702 31 0x55eea15ce02d Py_BytesMain + 45 32 0x7fc304ae4d90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fc304ae4d90] 33 0x7fc304ae4e40 __libc_start_main + 128 34 0x55eea15cdf25 _start + 37 [TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024020600Traceback (most recent call last): File "/TensorRT-LLM/examples/llama/../run.py", line 504, in main(args) File "/TensorRT-LLM/examples/llama/../run.py", line 379, in main runner = runner_cls.from_dir(*runner_kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 169, in from_dir session = GptSession(config=session_config, RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in error: peer access is not supported between these two devices (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/tensorrt_llm/cpp/tensorrt_llm/runtime/ipcUtils.cpp:48) 1 0x7fa4550dc0f7 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x8070f7) [0x7fa4550dc0f7] 2 0x7fa456dfacf8 tensorrt_llm::runtime::setPeerAccess(tensorrt_llm::runtime::WorldConfig, bool) + 216 3 0x7fa456de50da tensorrt_llm::runtime::GptSession::createCustomAllReduceWorkspace(int, int, int) + 202 4 0x7fa456de5e4d tensorrt_llm::runtime::GptSession::setup(tensorrt_llm::runtime::GptSession::Config const&) + 1117 5 0x7fa456de6291 tensorrt_llm::runtime::GptSession::GptSession(tensorrt_llm::runtime::GptSession::Config const&, tensorrt_llm::runtime::GptModelConfig const&, tensorrt_llm::runtime::WorldConfig const&, void const, unsigned long, std::shared_ptrnvinfer1::ILogger) + 977 6 0x7fa5a79c4c94 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x6ac94) [0x7fa5a79c4c94] 7 0x7fa5a799d6c9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x436c9) [0x7fa5a799d6c9] 8 0x7fa5a7987210 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2d210) [0x7fa5a7987210] 9 0x559af154a10e python3(+0x15a10e) [0x559af154a10e] 10 0x559af1540a7b _PyObject_MakeTpCall + 603 11 0x559af1558acb python3(+0x168acb) [0x559af1558acb] 12 0x559af1559635 _PyObject_Call + 277 13 0x559af1555087 python3(+0x165087) [0x559af1555087] 14 0x559af1540e2b python3(+0x150e2b) [0x559af1540e2b] 15 0x7fa5a79868c9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2c8c9) [0x7fa5a79868c9] 16 0x559af1540a7b _PyObject_MakeTpCall + 603 17 0x559af153a150 _PyEval_EvalFrameDefault + 30112 18 0x559af15587f1 python3(+0x1687f1) [0x559af15587f1] 19 0x559af1559492 PyObject_Call + 290 20 0x559af15355d7 _PyEval_EvalFrameDefault + 10791 21 0x559af154a9fc _PyFunction_Vectorcall + 124 22 0x559af153326d _PyEval_EvalFrameDefault + 1725 23 0x559af152f9c6 python3(+0x13f9c6) [0x559af152f9c6] 24 0x559af1625256 PyEval_EvalCode + 134 25 0x559af1650108 python3(+0x260108) [0x559af1650108] 26 0x559af16499cb python3(+0x2599cb) [0x559af16499cb] 27 0x559af164fe55 python3(+0x25fe55) [0x559af164fe55] 28 0x559af164f338 _PyRun_SimpleFileObject + 424 29 0x559af164ef83 _PyRun_AnyFileObject + 67 30 0x559af1641a5e Py_RunMain + 702 31 0x559af161802d Py_BytesMain + 45 32 0x7fa65c851d90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fa65c851d90] 33 0x7fa65c851e40 __libc_start_main + 128 34 0x559af1617f25 _start + 37 [TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024020600Traceback (most recent call last): File "/TensorRT-LLM/examples/llama/../run.py", line 504, in main(args) File "/TensorRT-LLM/examples/llama/../run.py", line 379, in main runner = runner_cls.from_dir(*runner_kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 169, in from_dir session = GptSession(config=session_config, RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in error: peer access is not supported between these two devices (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/tensorrt_llm/cpp/tensorrt_llm/runtime/ipcUtils.cpp:48) 1 0x7f9c820dc0f7 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x8070f7) [0x7f9c820dc0f7] 2 0x7f9c83dfacf8 tensorrt_llm::runtime::setPeerAccess(tensorrt_llm::runtime::WorldConfig, bool) + 216 3 0x7f9c83de50da tensorrt_llm::runtime::GptSession::createCustomAllReduceWorkspace(int, int, int) + 202 4 0x7f9c83de5e4d tensorrt_llm::runtime::GptSession::setup(tensorrt_llm::runtime::GptSession::Config const&) + 1117 5 0x7f9c83de6291 tensorrt_llm::runtime::GptSession::GptSession(tensorrt_llm::runtime::GptSession::Config const&, tensorrt_llm::runtime::GptModelConfig const&, tensorrt_llm::runtime::WorldConfig const&, void const, unsigned long, std::shared_ptrnvinfer1::ILogger) + 977 6 0x7f9dd49c4c94 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x6ac94) [0x7f9dd49c4c94] 7 0x7f9dd499d6c9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x436c9) [0x7f9dd499d6c9] 8 0x7f9dd4987210 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2d210) [0x7f9dd4987210] 9 0x563c2236a10e python3(+0x15a10e) [0x563c2236a10e] 10 0x563c22360a7b _PyObject_MakeTpCall + 603 11 0x563c22378acb python3(+0x168acb) [0x563c22378acb] 12 0x563c22379635 _PyObject_Call + 277 13 0x563c22375087 python3(+0x165087) [0x563c22375087] 14 0x563c22360e2b python3(+0x150e2b) [0x563c22360e2b] 15 0x7f9dd49868c9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2c8c9) [0x7f9dd49868c9] 16 0x563c22360a7b _PyObject_MakeTpCall + 603 17 0x563c2235a150 _PyEval_EvalFrameDefault + 30112 18 0x563c223787f1 python3(+0x1687f1) [0x563c223787f1] 19 0x563c22379492 PyObject_Call + 290 20 0x563c223555d7 _PyEval_EvalFrameDefault + 10791 21 0x563c2236a9fc _PyFunction_Vectorcall + 124 22 0x563c2235326d _PyEval_EvalFrameDefault + 1725 23 0x563c2234f9c6 python3(+0x13f9c6) [0x563c2234f9c6] 24 0x563c22445256 PyEval_EvalCode + 134 25 0x563c22470108 python3(+0x260108) [0x563c22470108] 26 0x563c224699cb python3(+0x2599cb) [0x563c224699cb] 27 0x563c2246fe55 python3(+0x25fe55) [0x563c2246fe55] 28 0x563c2246f338 _PyRun_SimpleFileObject + 424 29 0x563c2246ef83 _PyRun_AnyFileObject + 67 30 0x563c22461a5e Py_RunMain + 702 31 0x563c2243802d Py_BytesMain + 45 32 0x7f9e897b2d90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f9e897b2d90] 33 0x7f9e897b2e40 __libc_start_main + 128 34 0x563c22437f25 _start + 37 [TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024020600Traceback (most recent call last): File "/TensorRT-LLM/examples/llama/../run.py", line 504, in main(args) File "/TensorRT-LLM/examples/llama/../run.py", line 379, in main runner = runner_cls.from_dir(*runner_kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 169, in from_dir session = GptSession(config=session_config, RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in error: peer access is not supported between these two devices (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/tensorrt_llm/cpp/tensorrt_llm/runtime/ipcUtils.cpp:48) 1 0x7fe2bcedc0f7 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x8070f7) [0x7fe2bcedc0f7] 2 0x7fe2bebfacf8 tensorrt_llm::runtime::setPeerAccess(tensorrt_llm::runtime::WorldConfig, bool) + 216 3 0x7fe2bebe50da tensorrt_llm::runtime::GptSession::createCustomAllReduceWorkspace(int, int, int) + 202 4 0x7fe2bebe5e4d tensorrt_llm::runtime::GptSession::setup(tensorrt_llm::runtime::GptSession::Config const&) + 1117 5 0x7fe2bebe6291 tensorrt_llm::runtime::GptSession::GptSession(tensorrt_llm::runtime::GptSession::Config const&, tensorrt_llm::runtime::GptModelConfig const&, tensorrt_llm::runtime::WorldConfig const&, void const, unsigned long, std::shared_ptrnvinfer1::ILogger) + 977 6 0x7fe40f5c4c94 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x6ac94) [0x7fe40f5c4c94] 7 0x7fe40f59d6c9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x436c9) [0x7fe40f59d6c9] 8 0x7fe40f587210 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2d210) [0x7fe40f587210] 9 0x5617b011b10e python3(+0x15a10e) [0x5617b011b10e] 10 0x5617b0111a7b _PyObject_MakeTpCall + 603 11 0x5617b0129acb python3(+0x168acb) [0x5617b0129acb] 12 0x5617b012a635 _PyObject_Call + 277 13 0x5617b0126087 python3(+0x165087) [0x5617b0126087] 14 0x5617b0111e2b python3(+0x150e2b) [0x5617b0111e2b] 15 0x7fe40f5868c9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2c8c9) [0x7fe40f5868c9] 16 0x5617b0111a7b _PyObject_MakeTpCall + 603 17 0x5617b010b150 _PyEval_EvalFrameDefault + 30112 18 0x5617b01297f1 python3(+0x1687f1) [0x5617b01297f1] 19 0x5617b012a492 PyObject_Call + 290 20 0x5617b01065d7 _PyEval_EvalFrameDefault + 10791 21 0x5617b011b9fc _PyFunction_Vectorcall + 124 22 0x5617b010426d _PyEval_EvalFrameDefault + 1725 23 0x5617b01009c6 python3(+0x13f9c6) [0x5617b01009c6] 24 0x5617b01f6256 PyEval_EvalCode + 134 25 0x5617b0221108 python3(+0x260108) [0x5617b0221108] 26 0x5617b021a9cb python3(+0x2599cb) [0x5617b021a9cb] 27 0x5617b0220e55 python3(+0x25fe55) [0x5617b0220e55] 28 0x5617b0220338 _PyRun_SimpleFileObject + 424 29 0x5617b021ff83 _PyRun_AnyFileObject + 67 30 0x5617b0212a5e Py_RunMain + 702 31 0x5617b01e902d Py_BytesMain + 45 32 0x7fe4c449fd90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fe4c449fd90] 33 0x7fe4c449fe40 __libc_start_main + 128 34 0x5617b01e8f25 _start + 37

Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.

mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:

Process name: [[12811,1],5] Exit code: 1

root@c6fc756c94d5:/TensorRT-LLM/examples/llama#

`

I use this commands before

`# Build LLaMA 7B using 2-way tensor parallelism. python3 convert_checkpoint.py --model_dir ./Llama-2-7b-hf
--output_dir ./tllm_checkpoint_2gpu_tp2-v8
--dtype float16
--tp_size 8

trtllm-build --checkpoint_dir ./tllm_checkpoint_2gpu_tp2-v8
--output_dir ./tmp/llama/7B/trt_engines/fp16/8-gpu/
--gemm_plugin float16`

Feb 08 '24 20:02 jamil-z

Just wanna to double confirm that u're using 2 x NVIDIA Tesla V10016GB vRAM for running llama with tp_size 8?

Feb 09 '24 09:02 nv-guomingz

TensorRT-LLM TensorRT-LLM copied to clipboard

The issue with Llama when trying to create the: checkpoint: does not appear config.json

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.

Process name: [[12811,1],5] Exit code: 1

TensorRT-LLM
TensorRT-LLM copied to clipboard