Davis Wertheimer issues

Results 5 issues of


                                            Davis Wertheimer

Enable asynchronous dataloading

Current dataloader still causes gradual asymptotic slowdowns - likely because we have n_workers fixed to 0 in the dataloader. This forces the main process to also handle dataloading in a...

bug

enhancement

[speculator training] Speculator training

Add support for speculator training, piggybacking off the existing training utilities. Training script and speculator-specific utilities are inside the new `speculator` subfolder. Uses distributed setup, checkpointing, and dataloaders from this...

speculator training

IBM experimental dataloaders

This PR introduces an experimental PyTorch-native dataloader from IBM that is distributed, stateful, checkpointable, composable and rescalable. It is intended for use in large-scale model pretraining, particularly in research settings...

CLA Signed

Minimal implementation of muP scaling for Llama

Implement [muP scaling](https://arxiv.org/abs/2203.03466) for Llama models. Model follows muP scaling laws but introduces the minimal set of extra tunable hyperparameters that allows us to recover prior behavior - thus may...

Suppress spammy warnings

Current code prints multiple warnings from each gpu at the start of training, which clutters up the log. Updates dataloader and process group constructors to eliminate these warnings, respectively: ```...