aphrodite-engine issues

[Feature]: Exllamav2 Q4, Q6, and Q8 cache

3

### 🚀 The feature, motivation and pitch Only found a discussion asking about it but from evaluation it seems that Q4 is now better than FP8 and closer/almost equal to...

Anthonyg5005

[Bug]: LoRA broken when TP>1

### Your current environment ```text root@6eb559dd0e72:/app/aphrodite-engine# python3 env.py Collecting environment information... PyTorch version: 2.2.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A...

kubernetes-bad

bug

[sparsetral and Qwen2idae]: support for mixtral of lora

27

### The model to consider. https://huggingface.co/serpdotai/sparsetral-16x7B-v2-SPIN_iter1 https://huggingface.co/LoneStriker/sparsetral-16x7B-v2-8.0bpw-h8-exl2/tree/main https://huggingface.co/hywu/Qwen2idae-16x14B-v1.0 ### The closest model Aphrodite already supports. mixtral moe but not quite the same ### What's your difficulty of supporting the model...

sorasoras

[Feature]: Is there a reason CUDA 6.1 is the minimum? Would CUDA 6.0 on the P100 not work?

8

### 🚀 The feature, motivation and pitch In the setup.py it checks for CUDA 6.1 as a minimum and that requirement is also stated in the readme. Is there a...

Nero10578

[Bug]: gguf loading failed. config.json?

4

### Your current environment I excute that command below: python -m aphrodite.endpoints.openai.api_server --model /root/.cache/huggingface/hub/gguf/ --quantization gguf --gpu-memory-utilization 0.35 --max-model-len 4096 --port 8000 there is an error that OSError: /root/.cache/huggingface/hub/gguf/ does...

juud79

bug

[Usage]: Please provide the environment variable that closes the KoboldAI Lite page.

1

### Your current environment ```text Please provide the environment variable that closes the KoboldAI Lite page. ``` ### How would you like to use Aphrodite? Deploying aphrodite-engine with docker compose

online2311

[Bug]: Issue when trying to load a AWQ model with --load-in-4bits for mixtral flavors

10

### Your current environment ```Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS...

puppetm4st3r

bug

[Feature]: Provide configuration via env vars or a configuration file

### 🚀 The feature, motivation and pitch I realized I made a simple mistake after spending 30 minutes trying to figure out why the `DTYPE` environment variable wasn't working in...

alexandreteles

[Feature]: any workarounds for cc 6.0?

2

### 🚀 The feature, motivation and pitch as you know the Tesla p100 is still a competitive card for inference on most formats it doesn't seem to work with Aphrodite...

gilbrotheraway

[Feature]: Support hqq quantize method.

### 🚀 The feature, motivation and pitch https://mobiusml.github.io/hqq_blog/ HQQ is a fast and accurate model quantizer that skips the need for calibration data. It's super simple to implement (just a...

Minami-su

aphrodite-engine
aphrodite-engine copied to clipboard

Metadata

[Feature]: Exllamav2 Q4, Q6, and Q8 cache

[Bug]: LoRA broken when TP>1

[sparsetral and Qwen2idae]: support for mixtral of lora

[Feature]: Is there a reason CUDA 6.1 is the minimum? Would CUDA 6.0 on the P100 not work?

[Bug]: gguf loading failed. config.json?

[Usage]: Please provide the environment variable that closes the KoboldAI Lite page.

[Bug]: Issue when trying to load a AWQ model with --load-in-4bits for mixtral flavors

[Feature]: Provide configuration via env vars or a configuration file

[Feature]: any workarounds for cc 6.0?

[Feature]: Support hqq quantize method.

← Metadata

Owner

Metadata

aphrodite-engine aphrodite-engine copied to clipboard

Metadata

← Metadata

Owner

Metadata

aphrodite-engine
aphrodite-engine copied to clipboard