accelerated-scan issues

Torch 2.2 breakage on bfloat16 and float16

Running the triton implementation with torch 2.2 on inputs of type float16 and bfloat16 result in the following error: ``` File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None,...

proger

Is there any solution to support variable sequence length during training?

2

ruoyxue

warpscan: try float accumulation for bf16 and float16

@sustcsonglin has suggested that float accumulation might improve stability of the implementation. The current test I'm trying using to see this is: ``` python -m pytest tests -s -v -k...

proger

Integrate with Mamba

12

@proger Awesome work! Always appreciate the wonderful contributions of OSS advancing the frontiers of research. I know you've done a number of experiments comparing various scan implementations in your other...

jeromeku

Log-space version

1

[Feng et al.](https://arxiv.org/abs/2410.01201) proposed a log-space implementation of parallel scan for improved numerical stability. It should be fairly easy to implement, but I'm a bit out of practice with my...

kklemon

Question: integration with complex numbers

1

Hi, I was wondering if there are any plans to make the cuda code be compatible with complex numbers? This would be particularly helpful given that triton does not currently...

ekellbuch

Training Fast Weight Programmers by backpropagating through the Delta Rule

Introducing a kernel for training a [fast weight programmer](http://proceedings.mlr.press/v139/schlag21a/schlag21a.pdf) by backpropagating through the [delta rule](https://www-isl.stanford.edu/~widrow/papers/c1960adaptiveswitching.pdf) (online linear regression) with @ischlag. Improving on top of first order recurrence with scalar hidden...

proger

gates (A matrix) with a shape of batch * dim * dim * seqlen

4

Thank you for your excellent work! I was wondering if it’s possible to modify your code to handle a state-space model case where the gates (A matrix) have a more...

WeihanLikk

Crash with fp16 and bf16

Warp kernel crashes for some input data in fp16 and bf16. E.g. ``` [B C T ] [2, 2, 32768] -- works [4, 2, 32768] -- doesn't [2, 4, 32768]...

JohnAlphaIII

accelerated-scan
accelerated-scan copied to clipboard

Metadata

Torch 2.2 breakage on bfloat16 and float16

Is there any solution to support variable sequence length during training?

warpscan: try float accumulation for bf16 and float16

Integrate with Mamba

Log-space version

Question: integration with complex numbers

Training Fast Weight Programmers by backpropagating through the Delta Rule

gates (A matrix) with a shape of batch * dim * dim * seqlen

Crash with fp16 and bf16

← Metadata

Owner

Metadata

accelerated-scan accelerated-scan copied to clipboard

Metadata

← Metadata

Owner

Metadata

accelerated-scan
accelerated-scan copied to clipboard