aphrodite-engine
aphrodite-engine copied to clipboard
Large-scale LLM inference engine
### Your current environment ``` ./runtime.sh python env.py Collecting environment information... PyTorch version: 2.3.0 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A...
### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4...
### Your current environment ```text PyTorch version: 2.2.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC...
### Your current environment PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS (x86_64) GCC version:...
### Your current environment aphrodite docker container Setting 1 GPUs: RTX8000 * 2 model: alpindale/c4ai-command-r-plus-GPTQ Quantization: gptq Setting 2 GPUs: A6000 ada * 4 model: CohereForAI/c4ai-command-r-plus Quantization: load-in-smooth ### 🐛...
### Your current environment conda nccl v2.21.5.1 ### 🐛 Describe the bug I have 4 GPUs. 3x3090 and 1x2080ti 22g. I try to load cat llama 70b 5.0bpw exl2 with...
[Bug]:
### Your current environment ```text PyTorch version: 2.2.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS (x86_64) GCC...
If I set `CMD_ADDITIONAL_ARGUMENTS` to `--model turboderp/Mistral-7B-instruct-exl2 --revision 4.0bpw` Then I get this error: ``` 2024-03-13T14:03:42.164428603Z + exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 5000 --download-dir /app/tmp/hub --max-model-len 4096 --quantization...
This PR adds support for the [T5](https://huggingface.co/google/flan-t5-large) family of models, a series of encoder-decoder models. Currently a work in progress. TODO: - [x] Add the modeling code - [x] Add...