torchtune
torchtune copied to clipboard
MPS support
Context
- For testing purposes it can be useful to run directly on a local Mac computer.
Changelog
- Checks support for BF16 on MPS device.
- Added a configuration targeting MPS, changes to path were required due to the way Mac resolves symlinks in /tmp as /private/ActualPath.
- Set optimizer to Adam instead of AdamW to fit in memory on 64GB devices.
Test plan
- Ran a training job on Mistral 7b full finetune.
- The current test jobs are very CUDA specific, maybe this could be changed as well?
- We may need to integrate Mac runners in the pipeline.
- Some dependencies such as bitsandbytes are not yet Mac compatible.
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/790
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
This comment was automatically generated by Dr. CI and updates every 15 minutes.
@maximegmd This is awesome! Can you post some loss curves for the finetune you ran?
@maximegmd This is awesome! Can you post some loss curves for the finetune you ran?
I will complete a run during the weekend, losses looked fine but the Llama3 release changed my priorities ^^
device is not None when this function is called so it just passes 'mps' to torch.device() which is the expected pytorch name.
But you are correct that there is room for improvement to automatically return mps when device is not manually specified in the config.
Oh good point, totally glossed over the fact the device = torch.device(device) outside the if block. Yup sounds good. What's the iter/sec you're getting with this on a mac?
If I recall correctly it was around 20s/it but I suspect I was swapping a bit so I can probably improve the speed. The main issue is bitsandbytes not supporting MPS so it uses quite a bit of memory for the optimizer state.
I will try to push a llama3 config tomorrow with some numbers now that my llama3 finetune is running :)
Here is a train run on Gemma 2B, sadly laptop went to sleep right before the end but this is a 14 hours run, should be representative enough. log_1715964526.txt
Just wanted to chime in to raise a minor point. There's a snag with generation on apple silicon as torch.isin is only available on MPS in torch nightly. We make a call to this function during generation to check for stop tokens.
Hi @SalmanMohammadi thanks for pointing this out. Separately, what is the current plan with this PR? @maximegmd are you intending to merge this change? Please let me know if you need extra support on this.
The goal is to merge indeed, but considering the issue raised by @SalmanMohammadi maybe we should off and lock to pytorch 2.4+ ?
@maximegmd yeah fair point. Personally I would be OK with just doing a check on nightlies. I think a reasonable place to do this without it being too intrusive to the recipes is in _validate_device_from_env, since it gets called in all the recipes. Then you can add a check like if device.type == "mps" and not torch_version_ge("2.4.0") and raise an error (we recently added the _torch_version_ge utility here). The flipside is that any MPS user will need the nightlies for any recipe in the short-term, but this way we can also keep the recipe code clean. Let me know what you think of this kind of approach.
Btw I think there is a lot more we will have to do for proper, high-quality MPS support. But I'd love to at least take the first step here and not have to wait until the next release to gather feedback on the experience.
bump bump.
Is this still planned @maximegmd? If not, happy to pick it up. It's been invaluable to me so far, so would be happy to see other users be able to get up and running without needing big (medium?) GPUs.
This has been open without changes for 6 weeks. Can we merge this in @joecummings?
This just needs to be updated to use the latest util functions, I will do it tomorrow, then it should be ready to merge unless I am not seeing something
This just needs to be updated to use the latest util functions, I will do it tomorrow, then it should be ready to merge unless I am not seeing something
Nice! Thanks so much. One small thing I've experienced with the recent torchao dependency version upgrade, is that pip install torchao==0.3.1 doesn't find any suitable builds. I've gotten around this by installing from source (or pip install --no-cache-dir git+https://github.com/pytorch/[email protected]) - if you error out here make sure you've installedsetuptools and wheel.
If this sounds like a sensible way around this, perhaps it's worth adding to the README?
Nice! Thanks so much. One small thing I've experienced with the recent torchao dependency version upgrade, is that pip install torchao==0.3.1 doesn't find any suitable builds. I've gotten around this by installing from source (or pip install --no-cache-dir git+https://github.com/pytorch/[email protected]) - if you error out here make sure you've installedsetuptools and wheel.
I assume this is specifically on MPS? cc @msaroufim for thoughts here
I assume this is specifically on MPS? cc @msaroufim for thoughts here
Yeah just on my machine, it's been working fine on Linux.
Can you update the readme with an example for MPS?
Was on PTO but torchao as of 0.3.1 is still a linux only library, mac binaries are possible just not high priority right now
Codecov Report
Attention: Patch coverage is 70.08821% with 746 lines in your changes missing coverage. Please review.
Project coverage is 68.63%. Comparing base (
cb8e65a) to head (eb36f7b). Report is 83 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #790 +/- ##
==========================================
+ Coverage 67.10% 68.63% +1.52%
==========================================
Files 174 214 +40
Lines 7423 9691 +2268
==========================================
+ Hits 4981 6651 +1670
- Misses 2442 3040 +598
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
checking in again to see what the plan is on merging. Still haven't seen any movement. Would love to see mps make it to main.