fms-fsdp icon indicating copy to clipboard operation
fms-fsdp copied to clipboard

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.

Results 40 fms-fsdp issues
Sort by recently updated
recently updated
newest added

There seems to be a typo in the `Checkpointer` class. The `_cleanup` method calls `os.path.is_file` instead of `os.path.isfile`.

Edit: Bug fix was applied in #119. This PR thus only adds two unit tests to prevent bug regression. --- Code was calling `os.path.join()` too many times and causing the...

As we may have to deal with very long documents up to millions of characters/tokens, the `dataloader` may need to be tested and revised as needed when [it](https://github.com/foundation-model-stack/fms-fsdp/blob/2767c796422eade29a72d36fdf5d4d3a8af0672b/fms_fsdp/utils/dataloader_utils.py#L134) aims at...

Implement [muP scaling](https://arxiv.org/abs/2203.03466) for Llama models. Model follows muP scaling laws but introduces the minimal set of extra tunable hyperparameters that allows us to recover prior behavior - thus may...

Signed-off-by: Akash Nayak

A100 8GPU machine with NVLink connections; docker image: nvcr.io/nvidia/pytorch:23.12-py3; git clone https://github.com/foundation-model-stack/fms-fsdp.git git clone https://github.com/foundation-model-stack/foundation-model-stack.git git clone https://github.com/huggingface/optimum-nvidia.git cd foundation-model-stack pip install -e . cd ../fms-fsdp/ pip install -r requirements.txt...

the default model variant is "7b": https://github.com/foundation-model-stack/fms-fsdp/blob/65b0ea670fa375bb0f7f6a285e7229bb96ebdd0f/fms_fsdp/config/training.py#L8 but it is not in the supported white list: https://github.com/foundation-model-stack/fms-fsdp/blob/65b0ea670fa375bb0f7f6a285e7229bb96ebdd0f/fms_fsdp/utils/config_utils.py#L25

We observed noticeable variability when re-running the FSDP model training script for a small 1.xB llama2 model with fixed seed(s) and same tokens. Below is a snapshot of the evaluation...

The current code only looks for files in the dataset folder. When the dataset has additional nested folders, these arrow files are not seen

Current code prints multiple warnings from each gpu at the start of training, which clutters up the log. Updates dataloader and process group constructors to eliminate these warnings, respectively: ```...