Dirk Groeneveld
Dirk Groeneveld
This is the `kebab` config, a smaller version of the `dirk` config. Differences from `dirk`: * untied weights * weight decay on everything * adjusted `mlp_hidden_size` so we come out...
https://github.com/boto/boto3/issues/801 We can probably just wrap our client creation into a mutex? The problem is in util.py, line 520.
Our checkpointing mechanism works by running `torch.save()` on each rank. This creates, for each rank, a surprisingly large file that is a pickled state dict. A lot of the tensors...
This is running with slurm run id 4581738.
This should be a very easy thing to try. Just one setting in the config.