benchmarks Modernize MosaicBERT

This PR modernizes the MosaicBERT codebase with Flash Attention 2, PyTorch 2 (torch==2.1.1), and an updated version of composer (mosaicml>=0.17).

In particular, this updates MosaicBERT to be compatible with Flash Attention 2 (flash-attn==4.2.4), which now supports ALiBi slopes (PR#540).

Context:

The code for MosaicBERT was mostly written October - December 2022. At this time, Flash Attention was quite new and did not support linear biases (i.e. ALiBi). We therefore wrote a custom Flash Attention that supported ALiBi using triton in https://github.com/mosaicml/examples/blob/v0.0.4/examples/bert/src/flash_attn_triton.py. This version of triton also required PyTorch 1.13. This is also the kernel used for the MosaicBERT NeurIPS submission.
Now that Flash Attention natively supports ALiBi, we no longer need to support a custom triton implementation

See w&b runs here

Note that changes to files outside of examples/benchmarks/bert are simply formatting changes due to linting.

Jan 02 '24 17:01 Skylion007

Should be close to done @dakinggg, the two failed pytests were

FAILED tests/test_classification.py::test_classification_script - RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
FAILED tests/test_glue.py::test_glue_script[mosaic_bert] - RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
============= 2 failed, 3 passed, 3 warnings in 147.01s (0:02:27) ==============

Jan 05 '24 04:01 jacobfulano

UPDATE on 1/8/24: This was not an issue for me on a clean machine, so this is unlikely to be a real issue, and VERY unlikely to be an issue with this PR.

============== ORIGINAL: I don't think this error needs to hold up this PR, but FA2 was giving me some headaches as part of a clean requirements.txt installation. I fixed it by ensuring that packaging and torch were both installed BEFORE running the pip install for FA2.

Details:

Env: (This is in WSL for Windows, but most of the time that's equivalent to a Ubuntu environment, and I don't think it's the source of this error.)

I just checked out the branch and created a clean conda env. Then, I did the pip install -r requirements.txt and got an error:

❯ pip install -r requirements.txt
Collecting packaging (from -r requirements.txt (line 1))
  Using cached packaging-23.2-py3-none-any.whl.metadata (3.2 kB)
Collecting einops==0.5.0 (from -r requirements.txt (line 2))
  Using cached einops-0.5.0-py3-none-any.whl (36 kB)
Collecting torch==2.1.1 (from -r requirements.txt (line 3))
  Using cached torch-2.1.1-cp310-cp310-manylinux1_x86_64.whl.metadata (25 kB)
Collecting composer<0.18,>=0.17.0 (from composer[nlp,wandb]<0.18,>=0.17.0->-r requirements.txt (line 4))
  Using cached composer-0.17.2-py3-none-any.whl.metadata (27 kB)
Collecting mosaicml-streaming<=0.7 (from -r requirements.txt (line 5))
  Using cached mosaicml_streaming-0.7.0-py3-none-any.whl.metadata (20 kB)
Collecting omegaconf==2.3.0 (from -r requirements.txt (line 6))
  Using cached omegaconf-2.3.0-py3-none-any.whl (79 kB)
Collecting transformers==4.35.2 (from -r requirements.txt (line 7))
  Using cached transformers-4.35.2-py3-none-any.whl.metadata (123 kB)
Collecting flash_attn>=2.4.2 (from -r requirements.txt (line 9))
  Using cached flash_attn-2.4.2.tar.gz (2.4 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-snje5q4q/flash-attn_a0ad7b7eaf5e4b1bb1d9c8af1808da4b/setup.py", line 9, in <module>
          from packaging.version import parse, Version
      ModuleNotFoundError: No module named 'packaging'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I tried adding packaging to the top of the requirements.txt, but got the same error. This is happening I believe because FA2 is trying to run some setup stuff as part of its install?

so I pip install packaging on the command line:

Collecting packaging
  Using cached packaging-23.2-py3-none-any.whl.metadata (3.2 kB)
Using cached packaging-23.2-py3-none-any.whl (53 kB)
Installing collected packages: packaging
Successfully installed packaging-23.2

re-ran pip install -r requirements.txt:

❯ pip install -r requirements.txt
Requirement already satisfied: packaging in /home/taytay/miniconda3/envs/mosaic_bert_fa2/lib/python3.10/site-packages (from -r requirements.txt (line 1)) (23.2)
Collecting einops==0.5.0 (from -r requirements.txt (line 2))
  Using cached einops-0.5.0-py3-none-any.whl (36 kB)
Collecting torch==2.1.1 (from -r requirements.txt (line 3))
  Using cached torch-2.1.1-cp310-cp310-manylinux1_x86_64.whl.metadata (25 kB)
Collecting composer<0.18,>=0.17.0 (from composer[nlp,wandb]<0.18,>=0.17.0->-r requirements.txt (line 4))
  Using cached composer-0.17.2-py3-none-any.whl.metadata (27 kB)
Collecting mosaicml-streaming<=0.7 (from -r requirements.txt (line 5))
  Using cached mosaicml_streaming-0.7.0-py3-none-any.whl.metadata (20 kB)
Collecting omegaconf==2.3.0 (from -r requirements.txt (line 6))
  Using cached omegaconf-2.3.0-py3-none-any.whl (79 kB)
Collecting transformers==4.35.2 (from -r requirements.txt (line 7))
  Using cached transformers-4.35.2-py3-none-any.whl.metadata (123 kB)
Collecting flash_attn>=2.4.2 (from -r requirements.txt (line 9))
  Using cached flash_attn-2.4.2.tar.gz (2.4 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-gv890oec/flash-attn_bd567b3ed4774a49a637dedaf268441f/setup.py", line 19, in <module>
          import torch
      ModuleNotFoundError: No module named 'torch'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

FA2 is assuming that torch is already installed, but it's being installed as a sibling, so it's not a module yet! I moved the FA2 requirement to its own requirements_fa2.txt file and got the requirements.txt to succeed.

Then I installed FA2 by running that: pip install -r requirements_fa2.txt and it worked like a champ.

This no module named torch is not unheard of with FA2: https://github.com/Dao-AILab/flash-attention/issues/246

Jan 06 '24 20:01 Taytay

One more bug that I'll report here just in case it is not just a "my machine" thing. I didn't see NVidia Apex mentioned on the requirements, but when I get to the point where I am running this:

# This will pre-train a MosaicBERT that reaches the same downstream accuracy in roughly 1/3 the time.
composer main.py yamls/main/mosaic-bert-base-uncased.yaml

It looks like I need to have NVidia Apex installed:

/home/taytay/miniconda3/envs/mosaic_bert_fa2/lib/python3.10/site-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 6, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
Building eval loader...
Traceback (most recent call last):
  File "/home/taytay/YNAB/ML/mosaicml_examples_skylion/examples/benchmarks/bert/main.py", line 271, in <module>
    main(cfg)
  File "/home/taytay/YNAB/ML/mosaicml_examples_skylion/examples/benchmarks/bert/main.py", line 210, in main
    algorithms = [
  File "/home/taytay/YNAB/ML/mosaicml_examples_skylion/examples/benchmarks/bert/main.py", line 211, in <listcomp>
    build_algorithm(name, algorithm_cfg)
  File "/home/taytay/YNAB/ML/mosaicml_examples_skylion/examples/benchmarks/bert/main.py", line 72, in build_algorithm
    return algorithms.FusedLayerNorm(**kwargs)
  File "/home/taytay/miniconda3/envs/mosaic_bert_fa2/lib/python3.10/site-packages/composer/algorithms/fused_layernorm/fused_layernorm.py", line 110, in __init__
    check_if_apex_installed()
  File "/home/taytay/miniconda3/envs/mosaic_bert_fa2/lib/python3.10/site-packages/composer/algorithms/fused_layernorm/fused_layernorm.py", line 30, in check_if_apex_installed
    raise ImportError(
ImportError: https://github.com/NVIDIA/apex is not installed. The Fused LayerNorm algorithm cannot be applied. The MosaicML Docker Images (https://hub.docker.com/r/mosaicml/pytorch) contain a copy of APEX for easy use.
ERROR:composer.cli.launcher:Rank 0 crashed with exit code 1.
Waiting up to 30 seconds for all training processes to terminate. Press Ctrl-C to exit immediately.
Global rank 0 (PID 13697) exited with code 1
ERROR:composer.cli.launcher:Global rank 0 (PID 13697) exited with code 1

Jan 06 '24 20:01 Taytay

An update on the above: Once I installed Apex from source, the command worked.

You have already recommended the MosaicML Pytorch base image, which presumably comes with Apex pre-installed. I decided to ignore that handy tip and run from my existing WSL environment.

Something that would have helped me would be to clarify that if the user does not use the recommended Pytorch base image, they will need to install Apex after pip installing the requirements.txt. If I'm not the target audience, or this is opening you up to way too much config specification, I get it.

Jan 08 '24 16:01 Taytay

With regards to my comment :

I don't think this error needs to hold up this PR, but FA2 was giving me some headaches as part of a clean requirements.txt installation. I fixed it by ensuring that packaging and torch were both installed BEFORE running the pip install for FA2

This was not an issue for me on a clean machine, so this is unlikely to be a real issue, and VERY unlikely to be an issue with this PR.

Jan 08 '24 16:01 Taytay

I believe that one of the test yamls is missing:

algorithms:
  fused_layernorm: {}

I say that because in the README, it explains you can do a test run of training a Mosaic model by running:

# Run the pre-training script with the test config and MosaicBERT
composer main.py yamls/test/main.yaml model.name=mosaic_bert

However, yamls/test/main.yaml doesn't have these lines:

algorithms:
  fused_layernorm: {}

But yamls/main/mosaic-bert-base-uncased.yaml DOES specify fused_layernorm.

That means that the first time it tries to load Apex's fused_layernorm is when you get to this section:

# This will pre-train a MosaicBERT that reaches the same downstream accuracy in roughly 1/3 the time.
composer main.py yamls/main/mosaic-bert-base-uncased.yaml

I noticed this because I got an error when it tried to load Apex and my environment didn't have it installed. I was surprised because all of my "tests" from the README worked.

Jan 08 '24 18:01 Taytay

I believe that one of the test yamls is missing:
algorithms:
  fused_layernorm: {}
I say that because in the README, it explains you can do a test run of training a Mosaic model by running:
# Run the pre-training script with the test config and MosaicBERT
composer main.py yamls/test/main.yaml model.name=mosaic_bert
However, yamls/test/main.yaml doesn't have these lines:
algorithms:
  fused_layernorm: {}
But yamls/main/mosaic-bert-base-uncased.yaml DOES specify fused_layernorm.

That means that the first time it tries to load Apex's fused_layernorm is when you get to this section:
# This will pre-train a MosaicBERT that reaches the same downstream accuracy in roughly 1/3 the time.
composer main.py yamls/main/mosaic-bert-base-uncased.yaml
I noticed this because I got an error when it tried to load Apex and my environment didn't have it installed. I was surprised because all of my "tests" from the README worked.

Hi @Taytay,

Thanks for pointing this out. The MosaicML Composer library for a while used Fused Layernorm as a Composer "algorithm" to speed up pretraining. It relies on NVIDIA Apex and enables a faster kernel for LayerNorm.

More recently, we've been using Low Precision LayerNorm which does not rely on APEX and works just as well as Fused LayerNorm. From the Composer docs:

Low Precision LayerNorm is meant to replace our Fused LayerNorm algorithm. The two algorithms achieve very similar throughput. Fused LayerNorm also runs in low precision, but it is a more complex algorithm, since it uses a custom kernel. Since the custom kernel provides no additional speedup, we have replaced it with this simpler algorithm.

In the yaml, you can replace fused_layernorm with

algorithms:
  low_precision_layernorm: {}

I've updated the mosaicbert pretraining and finetuning yamls to use low_precision_layernorm.

Jan 08 '24 22:01 jacobfulano

Thanks @jacobfulano. That's good news. It's worth mentioning that I ran into a bug in this branch that is fixed by https://github.com/mosaicml/examples/pull/443

Jan 10 '24 18:01 Taytay

benchmarks benchmarks copied to clipboard

Modernize MosaicBERT

benchmarks
benchmarks copied to clipboard