Diego Fiori
Diego Fiori
SelectiveBackprop description: Acceleration is achieved by prioritizing examples with high loss at each iteration. This means using the output of a training example’s forward pass to decide whether to use...
We can replace target layers with similar but "cheaper" implementations in terms of FLOPs. Examples are: - Transformers layer with [ALiBi](https://openreview.net/pdf?id=R8sQPpGCv0) - Resnet implementation with [Resnet-RS](https://arxiv.org/pdf/2103.07579.pdf) - Linear layers with...
Implement the ModelReshaper class which should apply to the TrainingLearner a set of transformation for reducing its running time. Some basic transformation can be related to "tensor reshaping" for improving...
Add an auto-Installer, similarly to what has been done in nebullvm, for installing in an automatic way all the dependencies and libraries for running the backends supported by nebulgym. In...
Add the API documentation to nebulgym. We could generate it from the docstrings in the code using the [Sphinx library](https://www.sphinx-doc.org/en/master/).
Provide and attach to the README a table containing the benchmarks with vanilla PyTorch implementations. It would be lovely to have the performances of nebulgym on the most used hardware...
Tensorflow may have a faster training than PyTorch implementation for some model classes or on some specific hardware devices (such as TPUs). It would be a nice idea to add...
Add explicit support for distributed training on multiple machines whenever possible. We could leverage other outstanding open-source projects such as DeepSpeed and Ray.
The current implementation lacks coverage with unittest. We should create a `tests` folder in each nebulgym package containing tests (to be run with `pytest`) for the code contained in the...
The actual backpropagation patch is based on the [MeProp paper](https://arxiv.org/abs/1706.06197). Considering the amazing performances got with MeProp in FCNN, we should consider to add also CNN-version of MeProp. More info...