TensorRT-LLM
TensorRT-LLM copied to clipboard
TensorRT-LLM Requests
Hi all, this issue will track the feature requests you've made to TensorRT-LLM & provide a place to see what TRT-LLM is currently working on.
Last update: Jan 14th, 2024
🚀 = in development
Models
Decoder Only
- [ ] 🚀 Zephyr-7B - #157
- [ ] DeciLM-7B - #853
- [x] ChatGLM 3 - #180, #270
- [x] Mistral-7B - #49
- [x] Mixtral-7B - #616
Encoder / Encoder-Decoder
- [ ] DeBERTa - #174
- [ ] RoBERTa - #124
- [x] 🚀 BART, mBART - #285, #360
- [x] FLAN-T5 - #251, #285, #310
Multi-Modal
- [x] BLIP2 + T5 - #310, #531
- [x] LLaVa - #641,
- [x] Qwen-VL - #728
- [x] Generic Vision Encoder + LLM Support - #641, #310
- [x] BLIP2
- [x] Whisper - #323
Other
- [ ] YaRN - #792
- [ ] Expert Caching - #849
- [x] LoRA - #68
- [x] Mixtral - #616
Features & Optimizations
- [x] Context Chunking - #317
- [x] Speculative Decoding - #169, #224, #226 implementation done - documentation in progress
KV Cache
- [x] Reuse KV Cache - #292, #620
- [x] Attention Sinks (StreamingLLM, H2O) - #104
Quantization
- [ ] StarCoder INT8 SQ - #324
- [x] Qwen INT4 - #345
- [x] INT8 Weight only - #110
Sampling
- [ ] 🚀 support
frequnecy_penalty- #275 - [ ] Logit Manipulators - #241
- [x] Combine
repetition&presencepenalties - #274
Workflow
Front-ends
- [ ] OpenAI compatible API - #334
- [ ] Flag for end-of-stream - #240
- [ ] Load from Buffer - #144
- [x] Paged KV Cache Utilization Metric - #512
- [x] Log Probabilities - #238
- [x] Return only new tokens - #227
Integrations
- [ ] 🚀 LlamaIndex
- [ ] 🚀 LangChain
- [ ] Mojo - #556
Usage / Installation
- [x] pip install - #790,
Platform Support
- [ ] Jetson - #62, #488, #619
- [ ] V100, T4 MHA - #320
Please add CohereAI!!
CohereForAI/c4ai-command-r-plus
Llama 3 would be great (both 8B and 70B): https://github.com/NVIDIA/TensorRT-LLM/issues/1470
Maybe quantized to 8 or even 4 bit.
currently llama 3 throws a bunch of errors converting to TensorRT LLM
any ideal about the support for llama 3
Phi-3-mini should be amazing! Such a small 3.8B model could run quantized on many GPUs, with as little as 4GB VRAM.
- Paper: https://arxiv.org/abs/2404.14219
- Model weights: https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3
+1 for Phi-3
+1 for Command R Plus!
CohereForAI/c4ai-command-r-plus
hello @ncomly-nvidia, I am a student interested in the project! I want to ask if there are any good-first-issue feature request for Features & Optimizations recently? 🤣
+1 for OpenBMB/MiniCPM-V-2
Any news on support for jetson platform? Thanks in advance.
Requesting support for Meta's m4t v2 model, like how whisper support is provided.
How is it going for Jetson AGX ? It would be nice if all is compatible before Jetson Thor launch
LLaMa 3.2 multimodal vision models anytime soon?