aphrodite-engine
aphrodite-engine copied to clipboard
Large-scale LLM inference engine
### Your current environment ```text PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Manjaro Linux (x86_64) GCC version:...
This PR adds a custom floating point quantization method powered by [TorchAO](https://github.com/pytorch/ao), which achieves a high throughput, thanks to the optimized [fp6_llm](https://github.com/usyd-fsalab/fp6_llm) kernel. Use `-q torchao --torchao-fp-bits 6` to load...
[Bug]: LORA not working after commit e3f2ea4 "make punica kernels work with rocm" on rc_054 branch
### Your current environment ```python env.py Collecting environment information... PyTorch version: 2.3.0 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu...
### Your current environment I have server with 4x3090ti. I can run llama 3 70b with vllm in docker with command: `sudo docker run --shm-size=32g --log-opt max-size=10m --log-opt max-file=1 --rm...
### Your current environment Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS...
### Your current environment ```text PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS (x86_64) GCC...
Seems like our current implementation has an issue: ``` dynatemp_logits = logits[dynatemp_mask] ERROR: | ~~~~~~^^^^^^^^^^^^^^^ ERROR: | IndexError: The shape of the mask [1] at index 0 does not match...
Syncs the kobold lite embed and disables certain features that aphrodite cannot currently use. KoboldCPP impersonation version has not been incremented as no new features need to be enabled.
### Your current environment ```text The output of `python env.py` ``` ### How would you like to use Aphrodite? I want to get some AMD Navi based GPUs, but I...
### Your current environment ```text PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC...