aphrodite-engine issues

[Bug]: Flash attention cannot be used on v0.5.3

7

### Your current environment ``` ./runtime.sh python env.py Collecting environment information... PyTorch version: 2.3.0 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A...

Nero10578

bug

[Bug]: Int8 k/v cache calibrate don't work with QWen model?

1

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4...

bash99

bug

Try to fix gguf.

sgsdxzy

[Bug]: Converting gguf to state_dict

11

### Your current environment ```text PyTorch version: 2.2.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC...

heungson

bug

[Bug]: Cannot load llama-3 gguf based models

1

### Your current environment PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS (x86_64) GCC version:...

EugeoSynthesisThirtyTwo

bug

[Bug]: torch._dynamo.exc.BackendCompilerFailed with command-r-plus

3

### Your current environment aphrodite docker container Setting 1 GPUs: RTX8000 * 2 model: alpindale/c4ai-command-r-plus-GPTQ Quantization: gptq Setting 2 GPUs: A6000 ada * 4 model: CohereForAI/c4ai-command-r-plus Quantization: load-in-smooth ### 🐛...

heungson

bug

[Bug]: Cannot load 70b exl2 5bpw model across 4 GPUs.

14

### Your current environment conda nccl v2.21.5.1 ### 🐛 Describe the bug I have 4 GPUs. 3x3090 and 1x2080ti 22g. I try to load cat llama 70b 5.0bpw exl2 with...

Ph0rk0z

bug

[Bug]:

1

### Your current environment ```text PyTorch version: 2.2.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS (x86_64) GCC...

someoneexistsontheinternet

bug

Initial fetch for `config.json` ignores `--revision`?

13

If I set `CMD_ADDITIONAL_ARGUMENTS` to `--model turboderp/Mistral-7B-instruct-exl2 --revision 4.0bpw` Then I get this error: ``` 2024-03-13T14:03:42.164428603Z + exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 5000 --download-dir /app/tmp/hub --max-model-len 4096 --quantization...

josephrocca

[WIP] feat: T5 support

1

This PR adds support for the [T5](https://huggingface.co/google/flan-t5-large) family of models, a series of encoder-decoder models. Currently a work in progress. TODO: - [x] Add the modeling code - [x] Add...

AlpinDale

aphrodite-engine
aphrodite-engine copied to clipboard

Metadata

[Bug]: Flash attention cannot be used on v0.5.3

[Bug]: Int8 k/v cache calibrate don't work with QWen model?

Try to fix gguf.

[Bug]: Converting gguf to state_dict

[Bug]: Cannot load llama-3 gguf based models

[Bug]: torch._dynamo.exc.BackendCompilerFailed with command-r-plus

[Bug]: Cannot load 70b exl2 5bpw model across 4 GPUs.

[Bug]:

Initial fetch for `config.json` ignores `--revision`?

[WIP] feat: T5 support

← Metadata

Owner

Metadata

aphrodite-engine aphrodite-engine copied to clipboard

Metadata

← Metadata

Owner

Metadata

aphrodite-engine
aphrodite-engine copied to clipboard