Kevin Tan comments

Repositories
Issues
Comments

Results 2 comments of


                                            Kevin Tan

GGUF support

> I made a few updates and moved it to the [default branch](https://github.com/chu-tianxiang/vllm-gptq). Quantized embedding layers and output layers are added, as well as the QxW8 kernels. However the performance...

Canno launch with error exllamav2_kernels not installed.

> Build and install `rotary` and `layer_norm` from [flash-attn repository](https://github.com/Dao-AILab/flash-attention/tree/23e8fa5a263d1c7122bc46a86ef32030ee7130f9/csrc). hi @Semihal , can you give the command to build that?