torchtune Does torchtune require specific torch version?

Reported by Stas Beckman: attempting to use OffloadActivations seems to require a newer torch version?

$ python -c "from torchtune.training._activation_offloading import OffloadActivations"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/yak/.local/lib/python3.10/site-packages/torchtune/__init__.py", line 38, in <module>
    from torchtune import datasets, generation, models, modules, utils
  File "/home/yak/.local/lib/python3.10/site-packages/torchtune/datasets/__init__.py", line 7, in <module>
    from torchtune.datasets import multimodal
  File "/home/yak/.local/lib/python3.10/site-packages/torchtune/datasets/multimodal/__init__.py", line 7, in <module>
    from ._llava_instruct import llava_instruct_dataset
  File "/home/yak/.local/lib/python3.10/site-packages/torchtune/datasets/multimodal/_llava_instruct.py", line 10, in <module>
    from torchtune.data._messages import ShareGPTToMessages
  File "/home/yak/.local/lib/python3.10/site-packages/torchtune/data/__init__.py", line 7, in <module>
    from torchtune.data._collate import (
  File "/home/yak/.local/lib/python3.10/site-packages/torchtune/data/_collate.py", line 12, in <module>
    from torchtune.modules.attention_utils import packed_block_causal_mask
  File "/home/yak/.local/lib/python3.10/site-packages/torchtune/modules/__init__.py", line 7, in <module>
    from .attention import MultiHeadAttention  # noqa
  File "/home/yak/.local/lib/python3.10/site-packages/torchtune/modules/attention.py", line 12, in <module>
    from torchtune.modules.attention_utils import _MaskType, _sdpa_or_flex_attention
  File "/home/yak/.local/lib/python3.10/site-packages/torchtune/modules/attention_utils.py", line 13, in <module>
    from torchtune.utils._import_guard import _SUPPORTS_FLEX_ATTENTION
  File "/home/yak/.local/lib/python3.10/site-packages/torchtune/utils/__init__.py", line 7, in <module>
    from ._device import (
  File "/home/yak/.local/lib/python3.10/site-packages/torchtune/utils/_device.py", line 19, in <module>
    from torch.nn.attention.flex_attention import BlockMask
ModuleNotFoundError: No module named 'torch.nn.attention.flex_attention'

Apr 22 '25 16:04 janeyx99

or even:

python -c "from torchtune.modules import TiedLinear"

which is what's really needed to use torchtune.training._activation_offloading.OffloadActivations

I'm on pt-2.4 at the moment.

At the very least torchtune could say which min pytorch version it requires.

Apr 22 '25 16:04 stas00

Ah apologies @stas00 ! We have a note here that says we only test on the latest PyTorch stable and PyTorch nightlies, but we should definitely make that more clear.

We make no guarantees that it works for older versions of PyTorch, so you'll likely need PyTorch 2.6+.

Apr 22 '25 17:04 joecummings

@stas @janeyx99 Can this be closed with #2627 and #2626 ?

Apr 22 '25 18:04 joecummings

Understood!

Would it too difficult to do a runtime check for pytorch version in __init__.py and tell the user if there is a mismatch and what exact version is required?

I don't think the website doc is a good place to list required versions, the doc isn't there when the code fails.

For context: I wasn't trying to use torchtune, but just torchtune.training._activation_offloading.OffloadActivations - Jane and I were discussing her work on the pytorch slack and I wanted to try it out.

Apr 22 '25 18:04 stas00

Understood!

Would it too difficult to do a runtime check for pytorch version in __init__.py and tell the user if there is a mismatch and what exact version is required?

I don't think the website doc is a good place to list required versions, the doc isn't there when the code fails.

For context: I wasn't trying to use torchtune, but just torchtune.training._activation_offloading.OffloadActivations - Jane and I were discussing her work on the pytorch slack and I wanted to try it out.

Makes sense, we can definitely throw a warning if we detect the PyTorch version is older than the latest stable.

Apr 22 '25 19:04 joecummings

The failing code because the version is wrong will not have a warning next to it explaining to the user what is wrong.

Therefore an assert would be by far more superior than a warning.

warnings can't be seen when there are thousands of them, an assert is needed when a user doesn't have the env right.

Apr 22 '25 19:04 stas00

I can definitely understand where you're coming from and for the most part, when we detect that a certain feature is only available after a specific version, we do attempt to gate it - see here.

It's very helpful when people like yourself test out a new feature and bring it to our attention that the PyTorch version is not compatible b/c then we can either fix it or throw an error. However, it's hard to catch this without CI runners running all the old versions of PyTorch, which is prohibitively expensive.

Therefore, our options are either:

Force all users to always update their PyTorch versions to the latest in order to run anything through torchtune or import any modules. This would certainly make our lives as maintainers easier and we've thought about it, but ultimately we see that the majority of features still do work on older versions of PyTorch and it can be quite a hassle for users to update their PyTorch version. Moving forward, I know the whole PyTorch ecosystem is working to make updating a smoother experience. But for now that leaves us with option 2...
Allow users to run torchtune recipes or import torchtune modules even if they have other versions, while making it clear that we only test on the latest and greatest PyTorch versions. There's more risk and in some cases like this one there's definitely some friction, but in the end it we believe it allows more people to access the code they need with minimal roadblocks.

We opted for the second option and currently believe the benefits outweigh the risks, but we're always eager to incorporate feedback and modify our process. Hope this helps contextualize the thinking though.

Apr 22 '25 19:04 joecummings

Oh, thank you very much for explaining, Joe. I first understood that the latest version of pytorch is required.

Normally in all other frameworks I worked on we tested at the very least the lowest and the latest supported versions. But usually the lowest one should be the requirement to the minimal testing.

If older versions are in "may or may not be supported" it's an ambiguous state which would result in users leaving because they are confused or lots of tickets created to make older versions supported. But, of course, it's your call.

I'm good to stop here.

Apr 22 '25 19:04 stas00

No, thank you for your feedback!

If older versions are in "may or may not be supported" it's an ambiguous state which would result in users leaving because they are confused or lots of tickets created to make older versions supported. But, of course, it's your call.

This is a good call out. I'll take some immediate actions to make the guarantees clearer, but the team and I should also revisit the idea of minimum PyTorch version for testing in our CI. On the whole, I agree it will make things much clearer for torchtune's users.

Apr 22 '25 20:04 joecummings