Does Mistral can run on different GPUs(A100 & A5500) ?
Open
thj08
opened this issue 1 year ago
•
6 comments
I follow the instruction and can run Mistral with llama on GPU A100 or A5500
but I don't know how to set tp_size=2 / workers=2 to get the convert_checkpoint config
I try 2 GPUs on 1 device or 2 device(with 1 GPU), both are failed
If any suggestion, please let me know. Thanks for the reply.
You could try referring the guide of standard llama or other model. To run TP, you should add --tp_size 2 during converting the ckpt. If you still encounter error, please share your scripts and the full log by following the issue template.
[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024041600
[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024041600
[TensorRT-LLM][INFO] Engine version 0.10.0.dev2024041600 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][INFO] Engine version 0.10.0.dev2024041600 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found
[TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found
[TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found
[TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set.
[TensorRT-LLM][INFO] MPI size: 2, rank: 0
[TensorRT-LLM][INFO] MPI size: 2, rank: 1
Traceback (most recent call last):
File "/tmp/k8s/TensorRTLLM/examples/llama/../run.py", line 564, in
main(args)
File "/tmp/k8s/TensorRTLLM/examples/llama/../run.py", line 413, in main
runner = runner_cls.from_dir(*runner_kwargs)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 184, in from_dir
session = GptSession(config=session_config,
RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in cudaSetDevice(device): invalid device ordinal (/home/jenkins/agent/workspace/LLM/main/L0_MergeRequest/tensorrt_llm/cpp/tensorrt_llm/runtime/utils/sessionUtils.cpp:34)
1 0x7215a5417031 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x68c031) [0x7215a5417031]
2 0x7215a6e972d7 tensorrt_llm::runtime::GptSession::GptSession(tensorrt_llm::runtime::GptSession::Config const&, tensorrt_llm::runtime::GptModelConfig const&, tensorrt_llm::runtime::WorldConfig const&, void const, unsigned long, std::shared_ptrnvinfer1::ILogger) + 487
3 0x721617805a95 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xb8a95) [0x721617805a95]
4 0x7216177b5549 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x68549) [0x7216177b5549]
5 0x721617798737 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x4b737) [0x721617798737]
6 0x5644bebab10e python3(+0x15a10e) [0x5644bebab10e]
7 0x5644beba1a7b _PyObject_MakeTpCall + 603
8 0x5644bebb9acb python3(+0x168acb) [0x5644bebb9acb]
9 0x5644bebba635 _PyObject_Call + 277
10 0x5644bebb6087 python3(+0x165087) [0x5644bebb6087]
11 0x5644beba1e2b python3(+0x150e2b) [0x5644beba1e2b]
12 0x721617797da9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x4ada9) [0x721617797da9]
13 0x5644beba1a7b _PyObject_MakeTpCall + 603
14 0x5644beb9b150 _PyEval_EvalFrameDefault + 30112
15 0x5644bebb97f1 python3(+0x1687f1) [0x5644bebb97f1]
16 0x5644bebba492 PyObject_Call + 290
17 0x5644beb965d7 _PyEval_EvalFrameDefault + 10791
18 0x5644bebab9fc _PyFunction_Vectorcall + 124
19 0x5644beb9426d _PyEval_EvalFrameDefault + 1725
20 0x5644beb909c6 python3(+0x13f9c6) [0x5644beb909c6]
21 0x5644bec86256 PyEval_EvalCode + 134
22 0x5644becb1108 python3(+0x260108) [0x5644becb1108]
23 0x5644becaa9cb python3(+0x2599cb) [0x5644becaa9cb]
24 0x5644becb0e55 python3(+0x25fe55) [0x5644becb0e55]
25 0x5644becb0338 _PyRun_SimpleFileObject + 424
26 0x5644becaff83 _PyRun_AnyFileObject + 67
27 0x5644beca2a5e Py_RunMain + 702
28 0x5644bec7902d Py_BytesMain + 45
29 0x72178dce8d90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x72178dce8d90]
30 0x72178dce8e40 __libc_start_main + 128
31 0x5644bec78f25 _start + 37
[TensorRT-LLM][INFO] Loaded engine size: 7050 MiB
[TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 7210, GPU 7504 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 7212, GPU 7514 (MiB)
[TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2
Thank for the reply.
Is there any way can run on different GPUs?
container recognize different GPUs as the same (ex. nvidia.com/gpu), is it possibe to run on different GPUs in container?
Is pp_size also?
I set pp_size=2, it doesn't show any error message, but no more information
[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024041600
[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024041600
[TensorRT-LLM][INFO] Engine version 0.10.0.dev2024041600 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found
[TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found
[TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set.
[TensorRT-LLM][INFO] MPI size: 2, rank: 0
[TensorRT-LLM][INFO] Engine version 0.10.0.dev2024041600 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found
[TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found
[TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set.
[TensorRT-LLM][INFO] MPI size: 2, rank: 1
[TensorRT-LLM][INFO] Loaded engine size: 6923 MiB
[TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 7082, GPU 7502 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 7083, GPU 7512 (MiB)
[TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2
trt-deployment-6fc4ffd45-pb29j:1499:1499 [0] NCCL INFO Bootstrap : Using eth0:10.44.0.1<0>
trt-deployment-6fc4ffd45-pb29j:1499:1499 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v7 symbol.
trt-deployment-6fc4ffd45-pb29j:1499:1499 [0] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v6 (v6)
trt-deployment-6fc4ffd45-pb29j:1499:1499 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v7 symbol.
trt-deployment-6fc4ffd45-pb29j:1499:1499 [0] NCCL INFO NET/Plugin: Loaded coll plugin SHARP (v6)