add `mamba_chunk_scan_combined` and `mamba_split_conv1d_scan_combined` tests
This PR adds correctness tests for mamba_chunk_scan_combined and mamba_split_conv1d_scan_combined, which seemed to be missing. Forwards and backwards are tested against their reference implementations. Correctness when providing seq_idx is also tested.
@tridao I know the kernels inside of mamba_chunk_scan_combined and mamba_split_conv1d_scan_combined are individually tested, but I thought it would be worth it to add these more end-to-end tests. Thoughts?p
Any idea why the tolerances need to be that high? Those tolerances seem very high for float32. It is probably related to #683 #571
Yes, concerningly high, at least for the backwards where some tests need tol = 1e-1 and/or are sensitive to seeds.
My first suspicion was that it is an issue with the tests, rather than the kernels, but I haven't found any problems yet. And since the forwards tests pass at reasonable-ish 1e-2/1e-3 levels, any error would need to be a bit subtle.
I have also found some non-determinism with the backwards passes for the D grads. Haven't posted about it yet; will try to today.
Also, also this is relevant: non-determinism is expected in the backwards due to atomic adds, apparently.
Any idea why the tolerances need to be that high? Those tolerances seem very high for float32. It is probably related to #683 #571
Hi, thanks for mentioning this. I posted a solution for my case in #571 , you might want to check that. I was able to manage tolerances upto 1e-8 for all gradients and outputs.