metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

Metaflow + PyTorch: '_ClassNamespace' object is not iterable

Open timoffex opened this issue 5 months ago • 5 comments

An error is thrown when running a flow with PyTorch imported:

Traceback (most recent call last):
  File ".../site-packages/metaflow/cli.py", line 649, in main
    start(auto_envvar_prefix="METAFLOW", obj=state)
  ...
  File ".../site-packages/metaflow/package/__init__.py", line 87, in __init__
    self._mfcontent = MetaflowCodeContentV1(criteria=_module_selector)
  File ".../site-packages/metaflow/packaging_sys/v1.py", line 64, in __init__
    self._modules = {
  File ".../site-packages/metaflow/packaging_sys/v1.py", line 67, in <dictcomp>
    set(
TypeError: '_ClassNamespace' object is not iterable

To reproduce, try the helloworld.py example with import torch at the top (using the latest versions torch==2.7.1 and metaflow==2.16.5).

PyTorch adds a custom ModuleType-derived class to sys.modules here, which defines __getattr__ to always return a _ClassNamespace object. Unfortunately, that means that getattr(<that module>, "__path__") returns something unexpected.

IMO PyTorch is at fault here, but PyTorch has had this code for a while (maybe 11 months) so it's likely that many Metaflow users cannot use the most recent versions with PyTorch.

timoffex avatar Jul 21 '25 22:07 timoffex

The Python docs suggest using __spec__ instead, and for the offending module (torch.classes) that is None. Maybe using __spec__ is the proper way?

timoffex avatar Jul 21 '25 22:07 timoffex

Gah -- another one. I'll look at using __spec__. The doc also says: The [__path__](https://docs.python.org/3/reference/datamodel.html#module.__path__) attribute should be a (possibly empty) [sequence](https://docs.python.org/3/glossary.html#term-sequence) of strings enumerating the locations where the package’s submodules will be found. By definition, if a module has a __path__ attribute, it is a [package](https://docs.python.org/3/glossary.html#term-package). so ya, there is definitely some weirdness in Pytorch's implementation. I can try using __spec__ and hopefully that will solve things but I am sure there is some package somewhere that doesn't have a __spec__. I might also wrap this whole thing with a try/except and ignore any module that causes issues (any issue). The intent of this code is to automatically find modules that define a certain attribute -- it's not too harsh a requirement to say that those modules should behave more or less normally (and assume that those that don't don't have this METAFLOW_PACKAGE attribute.

romain-intel avatar Jul 22 '25 21:07 romain-intel

I created an issue in the PyTorch repo as well, though this should still be fixed in Metaflow in case other packages out there do things like this.

timoffex avatar Jul 22 '25 21:07 timoffex

Started looking at it -- this is quite devious:

  • the class actually returns something for METAFLOW_PACKAGE (ie: the module has the attribute since it returns a non-None object)
  • so metaflow happily assumes: great, this is a nice package I need to include
  • it then looks for its paths and that's when things go bad.

But ya, so this module returns something for both METAFLOW_PACKAGE AND __path__ and both are wrong :(. I'm going to harden this up. Second bug in this tiny area of code (where I really wasn't expecting things to go bad compared to all the rest :)).

romain-intel avatar Jul 22 '25 22:07 romain-intel

Oof, yeah that's tricky. I don't know if it's possible in your case, but relying on module names (just the key in sys.modules) rather than custom module properties could be more robust. Though checking m.METAFLOW_PACKAGE == 1 is clever.

timoffex avatar Jul 23 '25 00:07 timoffex