Jackmin801

Results 11 comments of Jackmin801

I've checked the other model checkpoints. All of them have bias tensors that are all zero (except 30b which has no bias tensors). This is the info I have about...

@tridao https://github.com/Dao-AILab/flash-attention/pull/2007

When can we expect the next version to be released?

This should remove the need to have it in the server args and instead have it as a kwarg passed to generate. https://github.com/Jackmin801/sglang/pull/2 Something like this would then work: ```python...

@justheuristic @borzunov Does this implementation look roughly correct to you? It doesnt seem to be working and hangs trying to process outputs in the `def process_output(output, output_actions: Dict[Arg, Callable[[torch.Tensor, int],...

Testing that the build and deploy action will work in a fork here: https://github.com/Jackmin801/flash-attention/actions/runs/19301136888 It seems some of the matrix elements dont build. Will look into it further

It seems to occur every batch. hrmm dont think its about pos_embeds, otherwise it would happen for flash attn too?

Im wondering if its a regression from cudnn so im building pytorch with older cudnn versions to see if anything changes. Changing cudnn version to 9.2.0 doesnt seem to help....

Seems to be from the CUDNN_ATTENTION implementation of sdpa. With the `viable/strict` build of pytorch, you can toggle the bug by forcing FLASH or CUDNN implementation. ```python from torch.nn.attention import...