torchtune MPS support

trafficstars

Context

For testing purposes it can be useful to run directly on a local Mac computer.

Changelog

Checks support for BF16 on MPS device.
Added a configuration targeting MPS, changes to path were required due to the way Mac resolves symlinks in /tmp as /private/ActualPath.
Set optimizer to Adam instead of AdamW to fit in memory on 64GB devices.

Test plan

Ran a training job on Mistral 7b full finetune.
The current test jobs are very CUDA specific, maybe this could be changed as well?
We may need to integrate Mac runners in the pipeline.
Some dependencies such as bitsandbytes are not yet Mac compatible.

Apr 18 '24 12:04 maximegmd

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/790

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Apr 18 '24 12:04 pytorch-bot[bot]

@maximegmd This is awesome! Can you post some loss curves for the finetune you ran?

Apr 18 '24 14:04 joecummings

@maximegmd This is awesome! Can you post some loss curves for the finetune you ran?

I will complete a run during the weekend, losses looked fine but the Llama3 release changed my priorities ^^

Apr 18 '24 19:04 maximegmd

device is not None when this function is called so it just passes 'mps' to torch.device() which is the expected pytorch name.

But you are correct that there is room for improvement to automatically return mps when device is not manually specified in the config.

Apr 19 '24 20:04 maximegmd

Oh good point, totally glossed over the fact the device = torch.device(device) outside the if block. Yup sounds good. What's the iter/sec you're getting with this on a mac?

Apr 19 '24 21:04 kartikayk

If I recall correctly it was around 20s/it but I suspect I was swapping a bit so I can probably improve the speed. The main issue is bitsandbytes not supporting MPS so it uses quite a bit of memory for the optimizer state.

I will try to push a llama3 config tomorrow with some numbers now that my llama3 finetune is running :)

Apr 19 '24 21:04 maximegmd

Here is a train run on Gemma 2B, sadly laptop went to sleep right before the end but this is a 14 hours run, should be representative enough. log_1715964526.txt

May 18 '24 09:05 maximegmd

Just wanted to chime in to raise a minor point. There's a snag with generation on apple silicon as torch.isin is only available on MPS in torch nightly. We make a call to this function during generation to check for stop tokens.

Jun 03 '24 13:06 SalmanMohammadi

Hi @SalmanMohammadi thanks for pointing this out. Separately, what is the current plan with this PR? @maximegmd are you intending to merge this change? Please let me know if you need extra support on this.

Jun 03 '24 15:06 ebsmothers

The goal is to merge indeed, but considering the issue raised by @SalmanMohammadi maybe we should off and lock to pytorch 2.4+ ?

Jun 04 '24 15:06 maximegmd

@maximegmd yeah fair point. Personally I would be OK with just doing a check on nightlies. I think a reasonable place to do this without it being too intrusive to the recipes is in _validate_device_from_env, since it gets called in all the recipes. Then you can add a check like if device.type == "mps" and not torch_version_ge("2.4.0") and raise an error (we recently added the _torch_version_ge utility here). The flipside is that any MPS user will need the nightlies for any recipe in the short-term, but this way we can also keep the recipe code clean. Let me know what you think of this kind of approach.

Btw I think there is a lot more we will have to do for proper, high-quality MPS support. But I'd love to at least take the first step here and not have to wait until the next release to gather feedback on the experience.

Jun 04 '24 22:06 ebsmothers

bump bump.

Is this still planned @maximegmd? If not, happy to pick it up. It's been invaluable to me so far, so would be happy to see other users be able to get up and running without needing big (medium?) GPUs.

Jul 04 '24 11:07 SalmanMohammadi

This has been open without changes for 6 weeks. Can we merge this in @joecummings?

Jul 12 '24 16:07 byjlw

This just needs to be updated to use the latest util functions, I will do it tomorrow, then it should be ready to merge unless I am not seeing something

Jul 12 '24 19:07 maximegmd

This just needs to be updated to use the latest util functions, I will do it tomorrow, then it should be ready to merge unless I am not seeing something

Nice! Thanks so much. One small thing I've experienced with the recent torchao dependency version upgrade, is that pip install torchao==0.3.1 doesn't find any suitable builds. I've gotten around this by installing from source (or pip install --no-cache-dir git+https://github.com/pytorch/[email protected]) - if you error out here make sure you've installedsetuptools and wheel.

If this sounds like a sensible way around this, perhaps it's worth adding to the README?

Jul 13 '24 12:07 SalmanMohammadi

Nice! Thanks so much. One small thing I've experienced with the recent torchao dependency version upgrade, is that pip install torchao==0.3.1 doesn't find any suitable builds. I've gotten around this by installing from source (or pip install --no-cache-dir git+https://github.com/pytorch/[email protected]) - if you error out here make sure you've installedsetuptools and wheel.

I assume this is specifically on MPS? cc @msaroufim for thoughts here

Jul 13 '24 15:07 ebsmothers

I assume this is specifically on MPS? cc @msaroufim for thoughts here

Yeah just on my machine, it's been working fine on Linux.

Jul 13 '24 16:07 SalmanMohammadi

Can you update the readme with an example for MPS?

Jul 14 '24 15:07 byjlw

Was on PTO but torchao as of 0.3.1 is still a linux only library, mac binaries are possible just not high priority right now

Jul 15 '24 20:07 msaroufim

Codecov Report

Attention: Patch coverage is 70.08821% with 746 lines in your changes missing coverage. Please review.

Project coverage is 68.63%. Comparing base (cb8e65a) to head (eb36f7b). Report is 83 commits behind head on main.

Files	Patch %	Lines
recipes/qat_distributed.py	0.00%	212 Missing :warning:
tests/torchtune/utils/test_distributed.py	12.16%	130 Missing :warning:
tests/recipes/test_lora_finetune_fsdp2.py	31.86%	62 Missing :warning:
recipes/lora_finetune_distributed.py	0.00%	36 Missing :warning:
recipes/lora_finetune_single_device.py	0.00%	35 Missing :warning:
recipes/eleuther_eval.py	0.00%	27 Missing :warning:
recipes/generate.py	0.00%	20 Missing :warning:
tests/recipes/test_lora_finetune_distributed.py	20.83%	19 Missing :warning:
tests/recipes/test_qat_distributed.py	48.57%	18 Missing :warning:
tests/recipes/test_lora_finetune_single_device.py	22.72%	17 Missing :warning:
... and 35 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #790      +/-   ##
==========================================
+ Coverage   67.10%   68.63%   +1.52%     
==========================================
  Files         174      214      +40     
  Lines        7423     9691    +2268     
==========================================
+ Hits         4981     6651    +1670     
- Misses       2442     3040     +598

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jul 15 '24 20:07 codecov-commenter

checking in again to see what the plan is on merging. Still haven't seen any movement. Would love to see mps make it to main.

Jul 26 '24 05:07 byjlw

torchtune torchtune copied to clipboard

MPS support

Context

Changelog

Test plan

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/790

Codecov Report

torchtune
torchtune copied to clipboard