Tom
Tom
@ananthsub commented 6 hours ago > However, the manner in which join tracks collectives can quickly run into issues with other collectives that run in the forward pass / training_step....
OK, thanks. Turns out, I needed to use CUDA.@time; with that change, the data makes more sense and both pullback and gradient are about equally slow, about 20x - 50x...
Tullio returns symbolic derivatives that look like they are almost directly usable as Tullio expressions; why aren't those used directly? I seem to get decent performance now with a simple...
Wouldn't that be fixable by either disallowing index computations on the LHS, or at least special casing for the case where there are no index computations on the LHS and...
@ToucheSir Oops, sorry, I thought I had updated the bug report with CUDA.@sync etc. I have updated the GIST link now. In any case, I still get a 40x ratio...
Thanks for tracking this down. I wasn't aware they had changed the default, that is good to know. I will look into fixing this. A couple of reasons why we...
Some of the stages inside the `WebDataset` class don't use handlers; that's probably a bug, I'll look into it. You get complete control over error handling at every stage with...
Error handling gets quite complicated when writing remotely. That's why ShardWriter just writes to local disk, but gives you a hook for uploading the data. ```Python def upload_shard(fname): os.system(f"gsutil cp...
Yes, sorry, the current collation function doesn't handle dictionaries. I'll add that as an enhancement.
Yes, I'm planning on working on the handful of PRs we have been discussing and addressing the issues raised.