TensorRT-LLM
TensorRT-LLM copied to clipboard
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficientl...
Hi, I am trying to benchmark Mixtral-7B,, however I get this error: ``` BS: 64, ISL/OSL: 128,128 ^A[TensorRT-LLM][ERROR] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at:...
### System Info * GPU V100, A100 * docker image `nvidia/cuda:12.1.0-devel-ubuntu22.04` * tensorrt-llm `0.9.0.dev2024020600` ### Who can help? @byshiue ### Information - [X] The official example scripts - [ ]...
Hello, tensorrt-llm team, I have been testing the performance for the combination of int8_kv_cache + weight_only(int8) on the llama-2-7b model. (testing with TensorRT-LLM release v0.7.1) The node contains 2 t4...
### System Info NVIDIA 4090 TensorRT-0.7.1 In nvidia-ammo, it appears these lines in `ammo/torch/export/layer_utils.py` have an unexpected failure for some Llama variants: In particular, the deepseek models use `LlamaLinearScalingRotaryEmbedding`. This...
Does Tensorrt-LLM have to use cuda12.2? Can I build it with cuda12.0 without the container? Here is my environment information: - Cuda version: 12.0 - TensorRT version: 9.2.0.5 - Cudnn...
Could you please provide a simple interface similar to OpenAI API?
Hello, thank you for this great project! Currently when converting a T5 model to TRT format, if I then want to use it from another codebase I need to clone...
### System Info - GPU properties: 8 * NVIDIA GeForce RTX 4090 - TensorRT-LLM branch: v0.7.1 - NVIDIA Driver Version: 535.154.05 - CUDA Version: 12.2 - Container used: build from...
The Link now leads to cuDNN 9.0.0 which does not work: link leads to cuDNN 9.0.0 which throws error ("FileNotFoundError: Could not find: cudnn64_8.dll. Is it on your PATH?") Works...
## System Info **Instance Type:** `g5.12xlarge` (vCPUs: 48, GPUs: 4, GPU Mem: 96GiB) (AWS Amazon SageMaker Notebook Instance) **GPU Family:** NVIDIA A10G Tensor Core GPUs **OS:** Amazon Linux 2 **TensorRT-LLM:**...