TensorRT-LLM issues

MPT not supported by current source code?

7

Having problems when using MPT. Setting: AWS g5.48xlarge, CUDA 12.1.0, Ubuntu 22.04, python 3.10, pytorch 2.1.2. ``` root@7f51eddb66f5:/TensorRT-LLM/examples/mpt# trtllm-build --checkpoint_dir=./ft_ckpts/mpt-7b/fp16 \ --max_batch_size 32 \ --max_input_len 1024 \ --max_output_len 512 \...

TobyGE

bug

make: *** [Make:55: release_build] Error 255

2

Device: Win 11; RTX 4090 When I run: `make -C docker release_build` it fails with the error below. ``` make: Entering directory '/home/mustapham/TensorRT-LLM/docker' Building docker image: tensorrt_llm/release:latest DOCKER_BUILDKIT=1 docker build...

MustaphaU

Should we run git lfs pull after git clone?

1

### System Info Any ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [ ]...

wangkuiyi

bug

"Import Error: /libs/libth_common.so: Undefined Symbol" While Building

5

Hi, while trying to run this ``` python build.py --model_dir $model_dir$ \ --dtype float16 \ --use_gpt_attentionZ_plugin float16 \ --use_gemm_plugin float16 \ --max_batch_size 4 \ --max_input_len 128 \ --max_output_len 128 ```...

eurus-ch

triaged

System hangs when I use multiple GPUs

2

Single GPU is OK, System hangs when I use multiple GPUs. Can someone help solve this? Thanks. python build.py --model_dir meta-llama/Llama-2-7b-chat-hf \ --dtype float16 \ --remove_input_padding \ --use_gpt_attention_plugin float16 \...

yirunwang

triaged

Question: Return log probabilites

10

Trying out T5 with python backend. https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/enc_dec/run.py#L484 I see SamplingConfig has output_log_probs https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/generation.py#L355. But in the return dict does not have the log probabilities https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/generation.py#L2515. Is there any other way...

sindhuvahinis

System hang when setting penalty

1

Here is my build command. ``` python build.py --model_dir Yi-34B-Chat --dtype float16 --remove_input_padding --use_gemm_plugin float16 --use_gpt_attention_plugin float16 --world_size 2 --tp_size 2 --enable_context_fmha --use_inflight_batching --paged_kv_cache --load_by_shard --use_weight_only --weight_only_precision int4 --output_dir /app/triton_model/tensorrt_llm/1...

Linzecong

triaged

Automate cuDNN setup

Muhtasham

CodeLlama-7B int4-awq get error of "The value updated is not the same shape as the original. "

15

### System Info CPU x86_64 GPU NVIDIA A10 TensorRT branch: main commid id:cad22332550eef9be579e767beb7d605dd96d6f3 CUDA: NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4 ### Who can help? Quantization: @Tracin ### Information...

activezhao

bug

triaged

llama trt-engine streaming mode does not work

7

### System Info H800 80G ### Who can help? _No response_ ### Information - [x] The official example scripts - [ ] My own modified scripts ### Tasks - [x]...

de1star

bug

TensorRT-LLM
TensorRT-LLM copied to clipboard

Metadata

MPT not supported by current source code?

make: *** [Make:55: release_build] Error 255

Should we run git lfs pull after git clone?

"Import Error: /libs/libth_common.so: Undefined Symbol" While Building

System hangs when I use multiple GPUs

Question: Return log probabilites

System hang when setting penalty

Automate cuDNN setup

CodeLlama-7B int4-awq get error of "The value updated is not the same shape as the original. "

llama trt-engine streaming mode does not work

← Metadata

Owner

Metadata

TensorRT-LLM TensorRT-LLM copied to clipboard

Metadata

← Metadata

Owner

Metadata

TensorRT-LLM
TensorRT-LLM copied to clipboard