AlpinDale issues

Results 75 issues of


                                            AlpinDale

[Build] fix: nvidia dockerfile issues

[Kernel] feat: add NVFP4 blockwise MoE kernels for sm_120

Not fully optimized, as a lot of the sm_100 codepath is still used for this. Tested with [alpindale/Ling-mini-2.0-NVFP4](https://huggingface.co/alpindale/Ling-mini-2.0-NVFP4), it gets about 91 tok/s decode (slower than the 140 tok/s with...

[Attention] feat: support PrefixLM

For moondream3 support, in a later PR.

[Kernel][Comms] feat: add custom all-gather kernels

We don't really use all-gather all that much, but for context parallel, all-gather is used quite a lot. This adds a fair bit of overhead when doing Context Parallelism, sometimes...

feat(compilation): Extend sequence parallelism to activations

Adds a new pattern to the sequence parallelism pass to support activations like SiLU and GELU. This transforms "AllReduce -> Activation" into "ReduceScatter -> Activation -> AllGather", enabling further fusion...

feat: overlap shared experts with send/recv

wip

[Kernel][Quantization] feat: add Gluon kernels for AWQ quantization

Still a WIP. Need to build triton from source. ```sh $ apt install zlib1g-dev $ git clone https://github.com/triton-lang/triton.git && cd triton $ uv pip install -r python/requirements.txt $ uv pip...

[Kernel] feat: sage attention support

Really slow at the moment, will investigate. V1 only. Launch with `APHRODITE_ATTENTION_BACKEND=SAGE_ATTN`

AlpinDale