justheuristic
justheuristic
Roadmap
This is a global project roadmap that states our priorities for the nearest future. These priorities can and should be disputed here or elsewhere, after which we will update the...
This is a collection of miscellaneous small updates that would make examples/albert more efficient or easier to understand. __Note 1:__ if you're looking for a more advanced example where many...
__problem:__ if many peers join at once, they will all pick one averager (latest at the time) as a target for loading initial state. This is causes choke points as...
We're using this dependency in one spot, where it can be replaced with ~5 lines of native code. Would be great to remove it
It's something we played with a few times but did not end up merging to master. I'm creating this issue so we wouldn't forget it. It would be great if...
Let's add a tutorial for training VIT/ResNet50 with Decentralized SGD The intent is to use DecentralizedSGD optimizer with [vissl](https://github.com/facebookresearch/vissl) library for swav. Here's a basic tutorial for training simclr in...
This is a test case that may eventually become a solution
**Describe the bug** In examples/albert, if a training monitor fails to load state from peers, it does not retry, but instead fails for good, which can happen midway through training....
(reported by CALM volunteers) **Describe the bug** This happens to a new peer that joins training while others are averaging __parameters__. Since all peers are averaging parameters, the newbie peer...
proposed by @mryab from #126 Virtual batching and LR scheduling are popular techniques with many applications. It would be nice to have an example of how to implement them with...