torchdistx
torchdistx copied to clipboard
Torch Distributed Experimental
**What does this PR do? Please describe:** Following the changes in https://github.com/pytorch/pytorch/pull/87855, we want to update references to `torch:TypeError` to `TORCH_CHECK_TYPE`. The changes are already made to fbcode in [internal...
I am not familiar with builds, but it seems that I cannot install `torchdistx` for any PyTorch version past 1.13 (e.g. if I am developing on top of current `master`)....
Hi I just took a quick look of fake tensor/module APIs. The defer initialization feature looks really cool to me. I am wondering, is there a way to de-materialize the...
**Describe the bug:** `exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2)` dtypes of `exp_avg_sq` and `grad` differ while using in-place operations. **Describe how to reproduce:** ``` # uses default hyperparameters such as momentum=float32...
Import sub-packages in `__init__.py` so they are the attributes of the package object. Fixes #66 **Check list:** - [ ] Was this **discussed and approved** via a GitHub issue? (not...
**What does this PR do? Please describe:** Adds an automatic check for BFloat16 support to AnyPrecision optimizer (self.verify_bfloat_support()). This happens at optimizer init if any of the relevant states are...
torchdistx sub-packages are not visible while trying to access them: ``` >>> import torchdistx >>> torchdistx.optimizers Traceback (most recent call last): File "", line 1, in AttributeError: module 'torchdistx' has...
Enhancement (credit to @rohan-varma): "this can be done in a follow up PR, but let's maybe consider not defaulting things to torch.bfloat16 eventually. this is because it might be good...
Problem - if the user runs AnyPrecision optimizer with Kahan and checkpoints the model/optimizer, restarting training may start with an empty compensation buffer. This is not a blocking problem, but...