Mitchell Wortsman

Results 88 comments of Mitchell Wortsman

That is a good idea, no we have not thought about this! It is difficult as supermasks which do the same thing could look very different.

seems like progress is being made with FSDP and also we think the OOM was because of model size + activations

Hey Aditya thanks for the PR with MRL -- however if you want to make MRL an option it would be good to have a flag so that this PR...

sure can you convert to draft in the meantime?

Yea totally agree, and while I'll likely keep using this for my existing run I like your implementation better for the repo going forward so I'll close this. Thanks!

closing because the hypothesis is that it relates to a filesystem issue which should not affect most

Here's one hypothesis for what's going on. Look at the graph for `logit_scale` and `samples/s` towards the end of training -- the dips in `logit_scale` occur towards the end of...

Hi Adam this looks great! I don't have access to this repo anymore because I'm not on the internship but let's keep this issue open so that other people can...

And very nice paper -- thanks for sharing!