algorithmic-efficiency
algorithmic-efficiency copied to clipboard
MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
## Original issue Self-tuning submissions use default values for `dropout_rate` and `aux_dropout_rate` (see #753). We want to allow them to specify custom values for these hyperparameters. ## Solution Allow self-tuning...
## Description Our current setup uses pre-commit hooks to enforce code quality checks before each commit. While this helps maintain consistency, it can slow down development. To balance consistency and...
it took me a long time to figure out the basics of how this benchmark works, so I think a short description at the beginning of the readme would be...
### Workload #### Task Text generation. #### Dataset TBD #### Model TBD Possible candidates include - [preferred starting point] [Nanodo](https://github.com/google-deepmind/nanodo) - NanoGPT - Meta’s [lingua](https://github.com/facebookresearch/lingua) - Keller Jordan’s [modded nanoGPT](https://github.com/KellerJordan/modded-nanogpt)...
It is useful to shard optimizer state across devices (to save significant memory). This reflects current practice. We want to support it. * We want to switch from no sharding...
It is useful to shard optimizer state across devices (to save significant memory). This reflects current practice. We want to support it. * We want to switch from no sharding...
## Description Currently we have been unable to reproduce the schedule free adamw results with JAX. There seem to be differences between the optax implementation of schedule-free adamw and the...
The train diff tests are difficult to run at the moment. - Add documentation on how to run them - Eliminate IO errors related to writing temporary results to files
Refactor modeldiff tests so that: - variable names are clear (e.g. no `pyt` instead of `pytorch`) - script names are clear and descriptive - deduplicate logic