AlpinDale comments

Results 170 comments of


                                            AlpinDale

Accelerate LLaMA model loading

Oh sorry, didn't mean to do that. :P

Prefix Caching

Hi @HaiShaw Triton doesn't seem to support mixed precision dot product, so this kernel here fails if the `k` is uint8 and `q` is another precision. I've been trying to...

feat: add aphrodite support

@krrishdholakia hi sorry for the late reply. I'd assume the LiteLLM OpenAI endpoint doesn't support any samplers beyond what OpenAI itself provides. Is that true? If not, I suppose we...

Remove triton dependency on musllinux

Now that triton has wheels upstream for musl, I believe this PR should be closed.

How about supporting alternatives to fine-tuning?

It would be great if Megatron-LM could support PEFT methods, e.g. QLoRA. We're sorely lacking a PEFT trainer with Tensor Parallelism.

[speculator training] Speculator training

Hi! What's the status on this PR? I'd like to train a few speculator models, but I'm not sure how to get started, due to a lack of documentation...

[speculator training] Speculator training

Thanks for the reply, @JRosenkranz I'd love to wait but I have access to a large cluster of H100s for a limited time, so I wanted to make the most...

[Bug]: Flash attention cannot be used on v0.5.3

Looks like installing flash-attn with our torch version doesn't work: ``` ImportError: /home/anon/miniconda3/envs/aphrodite/lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi ``` I'll look into it. Thanks for reporting.

[Bug]: torch._dynamo.exc.BackendCompilerFailed with command-r-plus

I'll get to investigating this soon; I've been busy with other projects so I haven't had much time to work on aphrodite lately. I have an inkling that this is...