Neel Gupta
Neel Gupta
Out of curiosity, how much work will need to be put to do this? Surely (in theory) checkpointing is unrollable 🤔 I'm getting a lot of performance hits that I...
To avoid any X-Y, my use case is effectively the same as [Universal Transformers](https://arxiv.org/abs/1807.03819) where I want to recursively apply a block of layers `n` times. `n` can be treated...
I'm not sure - using a vanilla `lax.scan` takes too much memory so I have to slice my batch_size by `4`. I can try using a `remat` policy on the...
Throughput measured after: `40` minutes in *tokens/s* | Method | Throughput | `n` | |--------|--------|--------| | `lax.scan` | 120k | 3 | | `lax.scan`, unroll=2 | 110k | 3 |...
Thanks! but what do you mean by "`for`-loop in the body function"? the body function doesn't have an iterative element to it - its just passing the `carry` through a...
What version of `bun` should I use then?
Is that fine? @nzw0301
> Thanks. In addition, could you share the minimal reproducible script to raise the error you report? These info would be helpful for reviewers. Here's a repro. Make sure to...
> I think this happens when a trial has no intermediate value (or not calling `trial.report`), which is unusual. So I'm not sure the shared script is the right usage...
alright @nzw0301 How does it look now?