faresobeid
faresobeid
### Description I'm trying to write a simple rnn-like for loop in pallas but get a ```Pallas NotImplementedError: unsupported layout change``` for some reason. If anyone can help fixing this...
### Description I'm trying to RWKV in jax using Pallas but run into problems with the for loop. In my code, I'm using jax.lax.fori_loop to iterate over the sequence dimension....
I tried out MLA and it was a good amount worse than MHA and wanted to try to find out why. Firstly, I am using a hybrid model therefore I...
Main changes: - DDLerp - Data dependent scalar decay - Vector valued time boost "u" - Removal of silu gate, replace groupnorm with layernorm and make FFN 3.5x instead of...
Would be very useful to have info on the average number of tokens each model uses to compare efficiency
With vf_eval.make_dataset, having support for an extra column for average_accuracy per prompt (over rollouts_per_example) would make difficulty filtering very easy
When eval'ing on a large dataset (esp for difficulty filtering), the evals hang indefinitely like with 32768 for example