Tri Dao

Results 280 comments of Tri Dao

No we have no plan for Mac.

We've trained from scratch without causal conv and it's still fine, just worse quality.

Yes, worse validation loss without the conv1d.

I'm not familiar with the implementation in HF. In any case it's just some model code so I think training from scratch should work.

Counting params should just be `sum(p.numel() for p in model.parameters())`. I'm not familiar with thop. I assume there's some way to specify how many flops a custom operation takes.

The initial value is set in the prefix_op. You probably want to change this line: https://github.com/state-spaces/mamba/blob/12d855003ba92c8a15d1739ce65a14c6fb16e254/csrc/selective_scan/selective_scan_fwd_kernel.cuh#L239 to something like (I haven't tested this): ``` if (chunk == 0) { running_prefix...

You can print stuff out with printf to see if you're accessing the right indices.