hfta
hfta copied to clipboard
Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion
After several epochs running, an error will raise as following: ``` /hfta/optim/lr_scheduler.py", line 247, in _update_lr res = this_lr * multiplier TypeError: unsupported operand type(s) for *: 'Coefficient' and 'float'...
* [y ] Tested * [y ] Formatted with YAPF
Ops: - [x] batchnorm.py - [x] conv.py - [x] dropout2d.py - [x] embedding.py - [x] layernorm.py - [x] linear.py - [ ] [no change needed?] multiheadattention.py - [x] pool.py -...
* [x] Tested * [x] Formatted with YAPF PyTorch 1.9 uses a functional API to perform Adam computation. However, it seems like we don't have a functional API for those...
Triggering the test runs on each PR.
- [ ] Convergence Testing: Using the existing examples. Comparing between HFTA vs. serial. Each with 5 runs, one seed per run, the compare the mean and std of the...
PyTorch native AMP (i.e., `torch.cuda.amp`) interacts with the optimizers and expects certain behaviors out of the optimizers. We should add unit tests to make sure that those interactions are fine.