TensorRT-LLM
TensorRT-LLM copied to clipboard
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficientl...
### System Info CPU: x86_64, memory: 1024GB, GPU: 8*A6000 48GB each, Tensorrt-LLM version 0.9.0.DEV20240226. NVIDIA-Driver Version: 535.171.04 CUDA Version: 12.2; OS - Ubuntu 22.04 ### Who can help? _No response_...
### System Info CPU: Intel(R) Xeon(R) Platinum 8369B, GPU: a single NVIDIA A10, Driver Version: 550.54.14, CUDA Version: 12.4, NVCC Version: 12.1.105, TensorRT-LLM Version: 0.9.0.dev2024022700, nvidia-ammo Version: 0.7.4 ### Who...
Is there any fesature related to GPT-like models that can be applied to BERT-like models?
### System Info Jetson Orin AGX, using the version 0.10 from pip ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My...
Trying to build an engine for llama 3 70b. I get `KeyError : "Architecture"`
### System Info NVIDIA A100-SXM4-80GB ### Who can help? _No response_ ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks -...
### System Info A100 GPUs (40GB) ### Who can help? @byshiue ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks -...
### System Info - OS: Ubuntu 22.04.4 LTS - Nvidia driver version: 545.23.08 - CPU architecture: x86 - RAM size: ~500GB - GPUs: 2xL40s 48GB - Docker container image: manually...
### System Info NVIDIA H20 97871MiB * 8 trt-llm 0.9.0 ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My own modified...