pytest
pytest copied to clipboard
Change importlib to first try to import modules using the standard mechanism
As detailed in https://github.com/pytest-dev/pytest/issues/11475#issuecomment-1937043670, currently with --import-mode=importlib pytest will try to import every file by using a unique module name, regardless if that module could be imported using the normal import mechanism without touching sys.path.
This has the consequence that non-test modules available in sys.path (via other mechanism, such as being installed into a virtualenv, PYTHONPATH, etc) would end up being imported as standalone modules, instead of imported with their expected module names.
To illustrate:
.env/
lib/
site-packages/
anndata/
core.py
Given anndata is installed into the virtual environment, python -c "import anndata.core" works, but pytest with importlib mode would import that module as a standalone module named ".env.lib.site-packages.anndata.core", because importlib module was designed to import test files which are not reachable from sys.path, but now it is clear that normal modules should be imported using the standard mechanisms if possible.
Now imporlib mode will first try to import the module normally, without changing sys.path, and if that fails it falls back to importing the module as a standalone module.
This supersedes #11931.
Fix #11475 Close #11931
@flying-sheep I believe the changes here are the correct approach for the importlib in general. With this PR the tests in your pytest-doctest-import-mismatch pass (including the doctest), the only change being required is to also exclude the conftest.py from being installed -- it should always be located near the test files, as the fixtures defined in conftest.py files are only available for tests in the same directory or below.
@flying-sheep would appreciate if you can test this PR using your original project to ensure it works as intended (if it is simple and you have the time, of course). 👍
Beautiful! If I understand correctly, this is basically the more efficient way to do what I did in my PR by brute forcing.
Indeed. The previous semantics were:
- Ignore
sys.pathentirely, and try to import everything by importing it directly, deriving a unique module name.
This has now changed to:
- Try to import the path normally with the current
sys.path. If that fails, import it directly deriving a unique module name.
In retrospect this is what we should have implemented since the beginning, but the thought when this was initially implemented was that it would be used to import test modules only, but we missed the case of importing normal modules too (for doctesting for example).
we missed the case of importing normal modules too (for doctesting for example).
An alternative way of solving this would have been to only collect doctests from roots using --pyargs, then always use sys.path for importing the doctests’ parent modules. But that ship has sailed I guess.
Thank you for spending time solving this!
I tested this with our package, and it seems to indeed work flawlessly!
An alternative way of solving this would have been to only collect doctests from roots using --pyargs, then always use sys.path for importing the doctests’ parent modules. But that ship has sailed I guess.
That would solve doctests, but I think the problem is more general and best solved in import_path directly, so any code importing a module will have the same behavior.
I tested this with our package, and it seems to indeed work flawlessly!
Thanks!
I haven't reviewed the changes yet, but regarding
Now imporlib mode will first try to import the module normally, without changing sys.path, and if that fails it falls back to importing the module as a standalone module.
Just to make sure, IIRC one of the reasons for adding importlib mode was to deal with the non-package-conftest problem where it's like
a/conftest.py
b/conftest.py
and then we try to load conftest module twice. Or it may be not with conftest but something similar, hopefully you get the idea.
If so, doesn't "first try to import the module normally" still run into a problem here, that a module with the same name but different path is already loaded, and we'll return it?
If so, doesn't "first try to import the module normally" still run into a problem here, that a module with the same name but different path is already loaded, and we'll return it?
Assuming tests is at the root of the repository and is not importable:
tests/
a/
conftest.py
b/
conftest.py
Here we will try to import tests/a/conftest.py and it will fail, so we fallback to import this module directly using importlib.import_module and derive a new name for it based on the rootdir: tests.a.conftest. Same for b/conftest.py, so they will end up with different module names and everything will work.
I will improve the tests to also cover this scenario so there is no doubt this works as intended.
Hmm I realize now probably the wording "standalone" is confusing; I meant "not part of a package", but I can see that one can understands this as "using the module's base name only".
Appreciate any suggestions.
E.g. consider I'm running pytest x.py and I have an x module in my site-packages, then the x module in site-packages will get executed, this is quite unexpected and possibly dangerous
Ouch, you are definitely right. Will have to think about this some more then.
(The code as written also has a bug, where even in the case the if mod.file and Path(mod.file) == path condition is false, the module from the site-packages will still be returned, because of the return sys.modules[module_name] just below).
Note that the module_name will be different, not the one we tried originally before the check.
E.g. consider I'm running pytest x.py and I have an x module in my site-packages, then the x module in site-packages will get executed, this is quite unexpected and possibly dangerous
Great catch here @bluetech, I really dropped the ball on this one. 😓
Using the specs for that first import, which crucially also uses the module location to import the file and not only the bare module name as my previous attempt, nicely solves this.
I added another test to ensure we are covering the situation you described, where pytest /path/to/x.py will only import /path/to/x.py, not some other x.py that just happens to be reachable from sys.path.
So I've just seen in the link you provided https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#native-namespace-packages a regular package nested inside of a namespace package... So I assume it is possible?
(I can't seem to reply inline to your original comment)
Sub-packages can have __init__.py files, but not the directories that make up the top level of the namespace package.
Consider this:
dist1/
com/
company/
app/
core/
__init__.py
__init__.py
dist2/
com/
company/
calc/
algo/
__init__.py
__init__.py
In the example above, com.company is the namespace package (which is shared), while app.core and calc.algo are the sub-packages.
Using this simple script to test it:
import sys
sys.path.append(r"e:\Temp\src\dist1")
sys.path.append(r"e:\Temp\src\dist2")
import com.company.app.core.models
import com.company.calc.algo.algorithms
print(com.company.app.core.models)
print(com.company.calc.algo.algorithms)
This prints:
<module 'com.company.app.core.models' from 'e:\\Temp\\src\\dist1\\com\\company\\app\\core\\models.py'>
<module 'com.company.calc.algo.algorithms' from 'e:\\Temp\\src\\dist2\\com\\company\\calc\\algo\\algorithms.py'>
app.core and calc.algo are free to have __init__ files, but adding an __init__.py file to com/ or company/ will break the logic, and then com.company will no longer be considered a namespace package.
To prove this, we create dist1/com/__init__.py and run the script again:
λ py -3.12 foo.py
Traceback (most recent call last):
File "e:\Temp\src\foo.py", line 10, in <module>
import com.company.calc.algo.algorithms
ModuleNotFoundError: No module named 'com.company.calc'
But as I mentioned, the sub-packages can have __init__.py, but they are optional: removing all of them gets the same results as the first example. However this case (no __init__.py) is not supported under the current code.
This is why resolve_pkg_root_and_module_name will bail if it finds a __init__.py file while moving upwards in the herarchy.
/edit: moved to https://github.com/pytest-dev/pytest/issues/11475#issuecomment-1976180927