cpython icon indicating copy to clipboard operation
cpython copied to clipboard

inspect.getsource() on sourceless dataclass raises undocumented exception

Open mthuurne opened this issue 3 years ago • 3 comments

Bug report

If I run the following program in Python 3.10:

from dataclasses import dataclass
from inspect import getsource

defs = {}
exec(
    """
@dataclass
class C:
    "The source for this class cannot be located."
""",
    {"dataclass": dataclass},
    defs,
)

try:
    getsource(defs["C"])
except OSError:
    print("Got the documented exception.")

The output is:

$ python sourceless_dataclass.py 
Traceback (most recent call last):
  File "<path>/sourceless_dataclass.py", line 16, in <module>
    getsource(defs["C"])
  File "/usr/lib/python3.10/inspect.py", line 1147, in getsource
    lines, lnum = getsourcelines(object)
  File "/usr/lib/python3.10/inspect.py", line 1129, in getsourcelines
    lines, lnum = findsource(object)
  File "/usr/lib/python3.10/inspect.py", line 940, in findsource
    file = getsourcefile(object)
  File "/usr/lib/python3.10/inspect.py", line 817, in getsourcefile
    filename = getfile(object)
  File "/usr/lib/python3.10/inspect.py", line 786, in getfile
    raise TypeError('{!r} is a built-in class'.format(object))
TypeError: <class 'C'> is a built-in class

The documentation states that OSError can be raised but does not mention TypeError.

The implementation of inspect.getsource() assumes that if a class has no __module__ attribute, it must be a built-in class, but a sourceless dataclass doesn't have a __module__ attribute either. I don't know whether this is a bug in getsource() or whether the generation of the dataclass should set __module__ to '__main__', but in any case the behavior is not as documented.

Your environment

  • CPython versions tested on: Python 3.10.6
  • Operating system and architecture: Ubuntu Linux 18.04

mthuurne avatar Oct 13 '22 11:10 mthuurne

Note that inspect.getmodule() returns the builtins module when passed a sourceless dataclass instead of returning None.

mthuurne avatar Oct 13 '22 12:10 mthuurne

Note there's a slightly different error with namedtuples. Note this is with 3.12.0a0:

>>> from collections import namedtuple
>>> from inspect import getsource
>>>
>>> defs = {}
>>> exec("""
... T = namedtuple("T", [])
... """,
... {'namedtuple': namedtuple},
... defs,
... )
>>> getsource(defs["T"])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...\cpython\Lib\inspect.py", line 1255, in getsource
    lines, lnum = getsourcelines(object)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "...\cpython\Lib\inspect.py", line 1237, in getsourcelines
    lines, lnum = findsource(object)
                  ^^^^^^^^^^^^^^^^^^
  File "...\cpython\Lib\inspect.py", line 1048, in findsource
    file = getsourcefile(object)
           ^^^^^^^^^^^^^^^^^^^^^
  File "...\cpython\Lib\inspect.py", line 925, in getsourcefile
    filename = getfile(object)
               ^^^^^^^^^^^^^^^
  File "...\cpython\Lib\inspect.py", line 893, in getfile
    raise OSError('source code not available')
OSError: source code not available
>>> from inspect import getmodule
>>> getmodule(defs["T"])
<module '__main__' (<class '_frozen_importlib.BuiltinImporter'>)>
>>>

Maybe that will help point someone in the right direction.

ericvsmith avatar Oct 15 '22 15:10 ericvsmith

OSError is the documented exception if the source cannot be found, so I think that for namedtuple it is working as intended. For the namedtuple class, __module__ is set to '__main__', so for consistency it might be good for the dataclass creation to do the same.

mthuurne avatar Oct 16 '22 00:10 mthuurne

I doubt that these two cases are the same:

  • namedtuple is a function that creates a new class
  • dataclass is a decorator that works on an existing class

Classes are created according to the regular Python rules. When you exec a source code without __name__ provided, it defaults to 'builtins'. And inspect does not know how to get sources of builtins.

This happens without @dataclass as well:

from inspect import getsource

defs = {}
exec(
    """
class C:
    "The source for this class cannot be located."
""",
    defs,
)

print(defs["C"].__module__)  # 'builtins'

try:
    getsource(defs["C"])
except OSError:
    print("Got the documented exception.")
# TypeError: <class 'C'> is a built-in class

This is why you get an exception.

Fix:

exec(
    """
@dataclass
class C:
    "The source for this class cannot be located."
""",
    {"dataclass": dataclass, "__name__": "__main__"},  # pass `__name__`!
    defs,
)

So:

  1. I propose to leave dataclasses alone
  2. I think we must document TypeError in inspect calls. It is explicit, it is here for quite a long time: there are no reasons to keep it in secret :)

I will send a PR for this.

sobolevn avatar Feb 08 '23 11:02 sobolevn

It appears to me that this is a kind of introspection limitation for sourceless dataclasses.

There are other ways to generate a sourceless dataclass, such as the dataclasses.make_dataclass() API. However, in that case __module__ is 'types', as can be seen using the documented example:

>>> from dataclasses import *
>>> C = make_dataclass('C',
...                    [('x', int),
...                      'y',
...                     ('z', int, field(default=5))],
...                    namespace={'add_one': lambda self: self.x + 1})
>>> 
>>> C
<class 'types.C'>
>>> C.__module__
'types'

Then in this case getsource() would raise the right error type (OSError):

>>> inspect.getsource(C)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/leof/miniforge3/envs/opt_einsum_dev/lib/python3.8/inspect.py", line 997, in getsource
    lines, lnum = getsourcelines(object)
  File "/home/leof/miniforge3/envs/opt_einsum_dev/lib/python3.8/inspect.py", line 979, in getsourcelines
    lines, lnum = findsource(object)
  File "/home/leof/miniforge3/envs/opt_einsum_dev/lib/python3.8/inspect.py", line 824, in findsource
    raise OSError('could not find class definition')
OSError: could not find class definition

and this time the error makes better sense, because the actually codegen is buried somewhere else, and it's not possible (to my knowledge, at least) to infer the lines & line number.

Ideally, in the case of sourceless dataclasses, one would likely want to overwrite __module__ to the call site (of make_dataclasses(), for example). Then somehow inspect should be able to make use of this knowledge and determine the appropriate source file & line number. But I am not sure if there's a robust solution to handle all corner cases, hence I think of it as a limitation.

leofang avatar Mar 20 '23 03:03 leofang

make_dataclass now supports module argument, please see https://github.com/python/cpython/pull/102104

sobolevn avatar Mar 20 '23 09:03 sobolevn

Thank you, @sobolevn, it's nice to see this new module argument added.

But as I said above, even after __module__ is added/overwritten, inspect.getsource() would still raise (in the OP's case __module__ is builtins, my case it's a user-supplied module location, so the error type differs). It'd be cool for getsource() to point to the line where the dataclass is created:

C = make_dataclass(...)

but I don't have good suggestion for how to make it 100% robust.

leofang avatar Mar 20 '23 14:03 leofang

I see we made the documentation change and sobolevn's #102104 helps here too. I'm not seeing a concrete additional suggestion (and maybe any such suggestion should be its own issue), so closing. Thanks all! :-)

hauntsaninja avatar Mar 25 '23 21:03 hauntsaninja