Fix: --include-private issue (#16808)
- (fix): when --include-private is not passed stubgen omits packages and modules which names start with "_" and do not end with "__"
This commit aims to close #16808.
Let me introduce you to my struggling.
At first I decided that this happens because of is_private_name(). Indeed, when stubgen iterates over nodes via visitor and calls is_private_name() it checks only node's (function's in our case) name:
stubutil.py
def is_private_name(self, name: str, fullname: str | None = None) -> bool:
...
if name == "_":
return False
if not name.startswith("_"):
return False
...
return True
As you can see from the above, is_private_name() checks only name without considering the fullname.
However, if we modify it and make it check fqn, the problem would still persist — stubfiles would be created and there would be nothing (empty files).
Then, to make stubgen omit those packages/modules entirely, I decided to incorporate this logic into generate_stub_for_py_module().
And it worked, checked manually on structure like the below.
(mypy) PS .../foobar$ tree /f /a
.
| __init__.py
|
+---subpackage
| open_module.py
| _internal_use_module.py
| __init__.py
|
\---_internal_use_subpackage
whatever_mod.py
__init__.py
The output:
(mypy) PS ...\out> tree /f /a
C:.
\---foobar
| __init__.pyi
|
\---subpackage
open_module.pyi
__init__.pyi
Seems to be OK.
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅
Thank you for working on this. Wouldn't this change make stubgen always skip private modules? Sometimes functions and classes defined in private modules are either re-exported or used in public modules of the code base. In these cases they should not be excluded from the stubs. I don't think stubgen is currently powerful enough to do this cross-file analysis of where symbols are used throughout the code base to determine which of them are really private. I think unfortunately the best we can do now is include everything and manually delete unused files later. (disclaimer: I am not a maintainer of mypy, just a regular contributor to stubgen)
@hamdanal
Thank you for the feedback.
True, private module should be included in some cases. This is precisely what the --include-private flag is for. There definitely must be if not include_private condition check before the early return.
Working on it...
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅
I think --include-private is meant for private symbols like classes and functions, not modules (at least this is how it works currently).
Suppose you have a public module in a package, let’s call it public.py and a sibling private module _private.py. It is very common to have something in public.py that does something like:
from ._private import Base
class Derived(Base): …
If you don’t generate _private.pyi, the stubs will be invalid. This is a very common pattern in my experience that I think should not be ignored by stubgen, especially not by default. One solution here is to have a separate command line flag for private modules instead (I don’t personally think it will be useful but if you have a convincing argument for it maybe a maintainer will accept it).
@hamdanal, thank you for the detailed explanation. Indeed, avoiding modules with leading underscores can result in the incompleteness of a stub file. Moreover, there are many such modules in the typeshed repository, such as _policybase.pyi in the email package from the standard library, _base.pyi in concurrent (btw the exact to example you've mentioned), and so on.
Tbh, so far, I don't have strong arguments for implementing a flag, like you said, --ignore-private-modules or something that would make stubgen omit those modules. Perhaps only if a user has numerous such _module_name.py modules, which are not preferred to be present in stubfiles yet, but it doesn't sound like a real use-case for me.
Since issue #16808 has been opened, shouldn't we mention this in the documentation somewhere? I mean, that stubgen generates .pyi for all modules including those with leading underscore.
OP of https://github.com/python/mypy/issues/16808 here.
@hamdanal points out correctly that any object in a Python module may be available via multiple import paths, some of which may contain leading underscored and some of which may not. Some may consider it bad style to expose objects in two such paths, but it does indeed happen.
I think we can agree that those methods are to be considered public, just like objects which are solely available via leading-underscored paths are to be considered private.
I don't think we should differentiate between a private module and a private function. Two flags --include-private and --ignore-private-modules would probably be confusing.
Instead, a proper solution is to check in is_private_* via which paths it's available.