aphrodite-engine
aphrodite-engine copied to clipboard
Large-scale LLM inference engine
### 🚀 The feature, motivation and pitch Only found a discussion asking about it but from evaluation it seems that Q4 is now better than FP8 and closer/almost equal to...
### Your current environment ```text root@6eb559dd0e72:/app/aphrodite-engine# python3 env.py Collecting environment information... PyTorch version: 2.2.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A...
### The model to consider. https://huggingface.co/serpdotai/sparsetral-16x7B-v2-SPIN_iter1 https://huggingface.co/LoneStriker/sparsetral-16x7B-v2-8.0bpw-h8-exl2/tree/main https://huggingface.co/hywu/Qwen2idae-16x14B-v1.0 ### The closest model Aphrodite already supports. mixtral moe but not quite the same ### What's your difficulty of supporting the model...
### 🚀 The feature, motivation and pitch In the setup.py it checks for CUDA 6.1 as a minimum and that requirement is also stated in the readme. Is there a...
### Your current environment I excute that command below: python -m aphrodite.endpoints.openai.api_server --model /root/.cache/huggingface/hub/gguf/ --quantization gguf --gpu-memory-utilization 0.35 --max-model-len 4096 --port 8000 there is an error that OSError: /root/.cache/huggingface/hub/gguf/ does...
### Your current environment ```text Please provide the environment variable that closes the KoboldAI Lite page. ``` ### How would you like to use Aphrodite? Deploying aphrodite-engine with docker compose
### Your current environment ```Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS...
### 🚀 The feature, motivation and pitch I realized I made a simple mistake after spending 30 minutes trying to figure out why the `DTYPE` environment variable wasn't working in...
### 🚀 The feature, motivation and pitch as you know the Tesla p100 is still a competitive card for inference on most formats it doesn't seem to work with Aphrodite...
### 🚀 The feature, motivation and pitch https://mobiusml.github.io/hqq_blog/ HQQ is a fast and accurate model quantizer that skips the need for calibration data. It's super simple to implement (just a...