pytest icon indicating copy to clipboard operation
pytest copied to clipboard

ModuleNotFoundError when using `--doctest-modules`, a `src/` layout, and `--import-mode=importlib` (and no editable mode install)

Open flying-sheep opened this issue 2 years ago • 23 comments

reproducer

We’d like to run our tests on an installed package instead of the development version, as we delete some files in our build process. We therefore don’t use editable installs.

I think the problem is that there seems to be no way to configure the import root used with the importlib import mode. The project rootdir gets passed here instead of something user configurable:

https://github.com/pytest-dev/pytest/blob/c34eaaaa1c485e9c8c3d1fe7f05549b6436f49cc/src/_pytest/doctest.py#L545-L549

Which means that the following code will run module_name = module_name_from_path('<rootdir>/src/anndata/_types.py', '<rootdir>'), i.e. module_name = 'src.anndata._types' when it should be 'anndata._types'

https://github.com/pytest-dev/pytest/blob/9f3bdac1500143c51ab358d07eb58b4cfa6d3fdf/src/_pytest/pathlib.py#L524-L525

Without src/ layout, this works accidentally, as the module_name happens to match the real module name, and the module only gets imported once.

Pytest 7.4.2

flying-sheep avatar Oct 02 '23 12:10 flying-sheep

I just wanted to add that we are hitting this issue as well, but we are using editable installs, so it looks like this issue affects both the case with editable and non-editable installs of the project source files.

Just to be clear, our project is:

  • an editable installation
  • has a src/ layout
  • puts non-doctest tests outside the application code, in a tests/ directory that is a sibling to the src/ directory
  • uses --import-mode=importlib
  • (attempts to) use --doctest-modules

We more or less exactly follow the suggested layout and settings in https://docs.pytest.org/en/7.1.x/explanation/goodpractices.html#tests-outside-application-code, and everything was working fine until attempting to run doctests within the application code using --doctest-modules.

richardxia avatar Dec 11 '23 21:12 richardxia

Yeah, could this please be added to the 8.0 milestone?

This might require a new setting or some breaking change, so this is the ideal moment

flying-sheep avatar Jan 12 '24 11:01 flying-sheep

Does it work better if you use pytest --doctest-modules --import-mode=importlib --pyargs my.package?

bluetech avatar Jan 12 '24 11:01 bluetech

~~Actually yes! It’s getting really complicated though:~~

[tool.pytest.ini_options]
addopts = [
    "--strict-markers",
    "--doctest-modules",
    "--pyargs",
]
testpaths = [
    "anndata", # doctests in docstrings (module name due to --pyargs)
    "./src/anndata/tests", # unit tests
    "./docs/concatenation.rst", # further doctests
]

~~and then in src/anndata/tests/conftest.py a collect_ignore = ["helpers.py"]. But it works.~~

flying-sheep avatar Jan 12 '24 12:01 flying-sheep

In the issue subject you write --import-mode=importlib but you are not using that right?

If you give me a bit of instructions how to reproduce this locally, I can take a look.

bluetech avatar Jan 12 '24 12:01 bluetech

Sorry, I muddled my project (https://github.com/scverse/anndata/pull/1151) too much with this issue. I’m trying to be clear now.

Adding --import-mode=importlib to the above results in this:

E   ModuleNotFoundError: No module named
  'home.phil..local.share.hatch.[…].site-packages.anndata._core';
  'home.phil..local.share.hatch.[…].site-packages.anndata' is not a package

What needs to work for this issue to be considered fixed:

+ src/my_module/__init__.py  # containing a doctest
+ tests/test_basic.py
[build-system]
build-backend = "hatchling.build"
requires = ["hatchling", "hatch-vcs"]

[project]
name = "my_module"
version = "0.1.0"

[tool.pytest.ini_options]
addopts = [
    "--import-mode=importlib",
    "--strict-markers",
    "--doctest-modules",
    "--pyargs",
]
testpaths = [
    "my_module",  # or without `--pyargs`: "./src/my_module"
    "./tests",
]
What would be nice if it worked for migrating our project:
+ src/my_module/
  + __init__.py  # containing a doctest
  + tests/
    + conftest.py  # containing `collect_ignore = ["helpers.py"]`
    + helpers.py
    + test_basic.py
...

[tool.pytest.ini_options]
addopts = [
    "--import-mode=importlib",
    "--strict-markers",
    "--doctest-modules",
    "--pyargs",
]
testpaths = [
    "my_module",
    "./src/my_module/tests",
]

flying-sheep avatar Jan 12 '24 12:01 flying-sheep

Very weird, I just can’t find a good reproducer. In our project, it fails: https://dev.azure.com/scverse/anndata/_build/results?buildId=5213&view=logs&jobId=5ea502cf-d418-510c-3b5f-c4ba606ae534&j=5ea502cf-d418-510c-3b5f-c4ba606ae534&t=5091afe0-1787-5ef7-c6a1-ce6dc06b30a7

platform linux -- Python 3.11.7, pytest-7.4.4, pluggy-1.3.0
rootdir: /home/vsts/work/1/s
configfile: pyproject.toml
testpaths: anndata, ./src/anndata/tests, ./docs/concatenation.rst
plugins: xdist-3.5.0, memray-1.5.0, anyio-4.2.0, cov-4.1.0
collected 3028 items / 20 errors

==================================== ERRORS ====================================
__________________________ ERROR collecting utils.py ___________________________
/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/anndata/utils.py:13: in <module>
    from ._core.sparse_dataset import BaseCompressedSparseDataset
E   ModuleNotFoundError: No module named 'opt.hostedtoolcache.Python.3.11.7.x64.lib.python3.11.site-packages.anndata._core'; 'opt.hostedtoolcache.Python.3.11.7.x64.lib.python3.11.site-packages.anndata' is not a package
__________________ ERROR collecting _core/aligned_mapping.py ___________________
/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/anndata/_core/aligned_mapping.py:23: in <module>
    from ..utils import deprecated, dim_len, ensure_df_homogeneous, warn_once
E   ImportError: cannot import name 'deprecated' from 'opt.hostedtoolcache.Python.3.11.7.x64.lib.python3.11.site-packages.anndata.utils' (/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/anndata/utils.py)
...

but using the “reproducer” I so confidently outlined above, everything works.

flying-sheep avatar Jan 12 '24 13:01 flying-sheep

OK, I minified the repo somewhat. I assume the problem is that even with --pyargs, pytest tries to import the module with weird names while running doctest.

https://github.com/flying-sheep/pytest-doctest-import-mismatch

I see errors like this:

____ ERROR collecting _core/merge.py ____
/home/phil/.local/share/hatch/env/virtual/anndata/kX3YdB0h/anndata/lib/python3.11/site-packages/anndata/_core/merge.py:28: in <module>
   from ..compat import _map_cat_to_str
E   ModuleNotFoundError: No module named 'home.phil..local.share.hatch.env.virtual.anndata.kX3YdB0h.anndata.lib.python3.11.site-packages.anndata.compat'; 'home.phil..local.share.hatch.env.virtual.anndata.kX3YdB0h.anndata.lib.python3.11.site-packages.anndata' is not a package
____ ERROR collecting _core/raw.py ____
/home/phil/.local/share/hatch/env/virtual/anndata/kX3YdB0h/anndata/lib/python3.11/site-packages/anndata/_core/raw.py:9: in <module>
   from .aligned_mapping import AxisArrays
E   ImportError: cannot import name 'AxisArrays' from 'home.phil..local.share.hatch.env.virtual.anndata.kX3YdB0h.anndata.lib.python3.11.site-packages.anndata._core.aligned_mapping' (/home/phil/.local/share/hatch/env/virtual/anndata/kX3YdB0h/anndata/lib/python3.11/site-packages/anndata/_core/aligned_mapping.py)

The reproducer is by no means minimal yet, and I can make it smaller for sure, but maybe it’s helpful before i get to that already.

flying-sheep avatar Jan 12 '24 16:01 flying-sheep

(First, consider editing the original post to point to the reproduction; the ImportPathMismatchError error stated there cannot happen in --import-mode=importlib so it's quite confusing).

Thanks for creating a reproduction. I tried it now. I was able to reproduce with pytest 7.4, but with pytest main it works (only if using venv name venv instead of .venv -- see below). I bisected the fix to commit 194a782e3817ee9f4f77a7a61ec68d25b3b08250 which is also included in 8.0.0rc1.

However, there's an issue I noticed: If the path contains dots, e.g. .venv and python3.11 in this case, then --import-mode=importlib fails like this:

Dot problem

__________________________________________________ ERROR collecting .venv/lib/python3.11/site-packages/anndata/__init__.py ___________________________________________________
.venv/lib/python3.11/site-packages/anndata/__init__.py:11: in <module>
    from ._core.anndata import AnnData
E   ValueError: Empty module name
________________________________________________ ERROR collecting .venv/lib/python3.11/site-packages/anndata/_core/access.py _________________________________________________
/usr/lib/python3.11/importlib/__init__.py:121: in import_module
    raise TypeError(msg.format(name))
E   TypeError: the 'package' argument is required to perform a relative import for '.venv.lib.python3.11.site-packages'
____________________________________________ ERROR collecting .venv/lib/python3.11/site-packages/anndata/_core/aligned_mapping.py ____________________________________________
/usr/lib/python3.11/importlib/__init__.py:121: in import_module
    raise TypeError(msg.format(name))
E   TypeError: the 'package' argument is required to perform a relative import for '.venv.lib.python3.11.site-packages'
________________________________________________ ERROR collecting .venv/lib/python3.11/site-packages/anndata/_core/anndata.py ________________________________________________
/usr/lib/python3.11/importlib/__init__.py:121: in import_module
    raise TypeError(msg.format(name))
E   TypeError: the 'package' argument is required to perform a relative import for '.venv.lib.python3.11.site-packages'
_________________________________________________ ERROR collecting .venv/lib/python3.11/site-packages/anndata/_core/merge.py _________________________________________________
/usr/lib/python3.11/importlib/__init__.py:121: in import_module
    raise TypeError(msg.format(name))
E   TypeError: the 'package' argument is required to perform a relative import for '.venv.lib.python3.11.site-packages'
_____________________________________________________ ERROR collecting .venv/lib/python3.11/site-packages/anndata/tests ______________________________________________________
/usr/lib/python3.11/importlib/__init__.py:121: in import_module
    raise TypeError(msg.format(name))
E   TypeError: the 'package' argument is required to perform a relative import for '.venv.lib.python3.11.site-packages.anndata.tests'

Python module names (individual components) cannot contain dots, and the --import-mode=importlib code does stuff like module_name.split(".").

I tried "escaping" the dots by replacing them with _dot_, which allows things to continue, however then it fails like this:

Exec module problem

self = <DoctestModule __init__.py>

    def collect(self) -> Iterable[DoctestItem]:
        import doctest
    
        class MockAwareDocTestFinder(doctest.DocTestFinder):
            """A hackish doctest finder that overrides stdlib internals to fix a stdlib bug.
    
            https://github.com/pytest-dev/pytest/issues/3456
            https://bugs.python.org/issue25532
            """
    
            def _find_lineno(self, obj, source_lines):
                """Doctest code does not take into account `@property`, this
                is a hackish way to fix it. https://bugs.python.org/issue17446
    
                Wrapped Doctests will need to be unwrapped so the correct
                line number is returned. This will be reported upstream. #8796
                """
                if isinstance(obj, property):
                    obj = getattr(obj, "fget", obj)
    
                if hasattr(obj, "__wrapped__"):
                    # Get the main obj in case of it being wrapped
                    obj = inspect.unwrap(obj)
    
                # Type ignored because this is a private function.
                return super()._find_lineno(  # type:ignore[misc]
                    obj,
                    source_lines,
                )
    
            def _find(
                self, tests, obj, name, module, source_lines, globs, seen
            ) -> None:
                if _is_mocked(obj):
                    return
                with _patch_unwrap_mock_aware():
                    # Type ignored because this is a private function.
                    super()._find(  # type:ignore[misc]
                        tests, obj, name, module, source_lines, globs, seen
                    )
    
            if sys.version_info < (3, 13):
    
                def _from_module(self, module, object):
                    """`cached_property` objects are never considered a part
                    of the 'current module'. As such they are skipped by doctest.
                    Here we override `_from_module` to check the underlying
                    function instead. https://github.com/python/cpython/issues/107995
                    """
                    if isinstance(object, functools.cached_property):
                        object = object.func
    
                    # Type ignored because this is a private function.
                    return super()._from_module(module, object)  # type: ignore[misc]
    
            else:  # pragma: no cover
                pass
    
        if self.path.name == "conftest.py":
            module = self.config.pluginmanager._importconftest(
                self.path,
                self.config.getoption("importmode"),
                rootpath=self.config.rootpath,
            )
        else:
            try:
>               module = import_path(
                    self.path,
                    root=self.config.rootpath,
                    mode=self.config.getoption("importmode"),
                )

../../src/pytest/src/_pytest/doctest.py:569: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

p = PosixPath('/home/ran/tmp/pytest-doctest-import-mismatch/.venv/lib/python3.11/site-packages/anndata/__init__.py')

    def import_path(
        p: Union[str, "os.PathLike[str]"],
        *,
        mode: Union[str, ImportMode] = ImportMode.prepend,
        root: Path,
    ) -> ModuleType:
        """Import and return a module from the given path, which can be a file (a module) or
        a directory (a package).
    
        The import mechanism used is controlled by the `mode` parameter:
    
        * `mode == ImportMode.prepend`: the directory containing the module (or package, taking
          `__init__.py` files into account) will be put at the *start* of `sys.path` before
          being imported with `importlib.import_module`.
    
        * `mode == ImportMode.append`: same as `prepend`, but the directory will be appended
          to the end of `sys.path`, if not already in `sys.path`.
    
        * `mode == ImportMode.importlib`: uses more fine control mechanisms provided by `importlib`
          to import the module, which avoids having to muck with `sys.path` at all. It effectively
          allows having same-named test modules in different places.
    
        :param root:
            Used as an anchor when mode == ImportMode.importlib to obtain
            a unique name for the module being imported so it can safely be stored
            into ``sys.modules``.
    
        :raises ImportPathMismatchError:
            If after importing the given `path` and the module `__file__`
            are different. Only raised in `prepend` and `append` modes.
        """
        mode = ImportMode(mode)
    
        path = Path(p)
    
        if not path.exists():
            raise ImportError(path)
    
        if mode is ImportMode.importlib:
            module_name = module_name_from_path(path, root)
            with contextlib.suppress(KeyError):
                return sys.modules[module_name]
    
            for meta_importer in sys.meta_path:
                spec = meta_importer.find_spec(module_name, [str(path.parent)])
                if spec is not None:
                    break
            else:
                spec = importlib.util.spec_from_file_location(module_name, str(path))
    
            if spec is None:
                raise ImportError(f"Can't find module {module_name} at location {path}")
            mod = importlib.util.module_from_spec(spec)
            sys.modules[module_name] = mod
>           spec.loader.exec_module(mod)  # type: ignore[union-attr]

../../src/pytest/src/_pytest/pathlib.py:540: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <_frozen_importlib_external.SourceFileLoader object at 0x7feac0a280d0>
module = <module '_dot_venv.lib.python3_dot_11.site-packages.anndata' from '/home/ran/tmp/pytest-doctest-import-mismatch/.venv/lib/python3.11/site-packages/anndata/__init__.py'>

>   ???

<frozen importlib._bootstrap_external>:940: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

f = <built-in function exec>
args = (<code object <module> at 0x7feac0c24ff0, file "/home/ran/tmp/pytest-doctest-import-mismatch/.venv/lib/python3.11/site...__file__': '/home/ran/tmp/pytest-doctest-import-mismatch/.venv/lib/python3.11/site-packages/anndata/__init__.py', ...})
kwds = {}

>   ???

<frozen importlib._bootstrap>:241: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    """Annotated multivariate observation data."""
    from __future__ import annotations
    
    # Allowing notes to be added to exceptions. See: https://github.com/scverse/anndata/issues/868
    import sys
    
    if sys.version_info < (3, 11):
        # Backport package for exception groups
        import exceptiongroup  # noqa: F401
    
>   from ._core.anndata import AnnData
E   ModuleNotFoundError: No module named '_dot_venv'

.venv/lib/python3.11/site-packages/anndata/__init__.py:11: ModuleNotFoundError

This is possibly another problem with --import-mode=importlib; if I swap the order of the exec_module and the insert_missing_modules statements then it does work, but I'm not familiar enough the code to know if it's correct.

@nicoddemus Do you have any idea about this?

bluetech avatar Jan 12 '24 19:01 bluetech

First, consider editing the original post to point to the reproduction; the ImportPathMismatchError error stated there cannot happen in --import-mode=importlib so it's quite confusing

fixed

Thanks for creating a reproduction. I tried it now. I was able to reproduce with pytest 7.4, but with pytest main it works

Well, unless I’m missing something, the same package is still imported twice with this approach, right?

Which might cause problems similar to https://github.com/pytest-dev/pytest/issues/11306 when the two versions try to access some shared resource. (e.g. let’s say they both open a database connection or write a file on import time – not super good style, but possible)

flying-sheep avatar Jan 15 '24 12:01 flying-sheep

@nicoddemus any idea?

flying-sheep avatar Jan 26 '24 13:01 flying-sheep

I think no “escaping” hackery is necessary. We run pytest --pyargs <modulename>, so the parent module of any doctest should start with <modulename> and not _dot_venv or opt.hostedtoolcache.Python*site-packages.

So how about we just infer the correct module name of the parent module using sys.path by enforcing that it matches the module name supplied using --pyargs?

flying-sheep avatar Feb 05 '24 11:02 flying-sheep

Sorry for the long post, spent the last 3 hours working on this and I'm afraid there's no simple "fix" for this case.

The short answer is that pytest uses --import-mode=importlib for all imports, not only tests and conftests as originally intended.

The introduction of importlib mode was meant to give an alternative to the other modes so that:

  1. pytest does not need change sys.path to import tests and conftest files.
  2. Different test modules can have the same base name, without __init__.py files.

(Full explanation in the docs).

On a non-src layout:

pkg/__init__.py
tests/
  test_foo.py

One of the reasons why changing sys.path is problematic in this case is because it will add . to the root directory to import the tests under tests, which has the side effect of making pkg also importable. This can hide problems when trying to test the installed package in .venv, because now we are actually testing the local sources as they are reachable through sys.path during testing, instead of the installed packages in .venv.

On src layouts:

src/pkg/__init__.py
tests/
  test_foo.py

This is not a problem: adding . to sys.path does not make pkg importable, so import pkg will import from the .venv as desired.

The reproducer repository is using importlib mode, I'm guessing because of reason 2 (duplicated test names), or perhaps because at one point we recommended using importlib intending to turn it into the default, however since then we realized this is not possible, as it brings it has its own drawbacks.

@flying-sheep, by changing your configuration slightly I was able to make your reproduce run without problems in pytest main and in 7.4.4:

  • Exclude conftest.py files from being installed: they should be imported by pytest directly.
  • Drop importlib, as you are using src layout. If the problem is due to duplicated test names, just adding __init__.py on the folders with the duplicated tests solves it.
λ pytest
======================== test session starts ========================
platform win32 -- Python 3.11.5, pytest-7.4.4, pluggy-1.4.0
rootdir: e:\projects\pytest-doctest-import-mismatch
configfile: pyproject.toml
testpaths: anndata, ./src/anndata/tests
collected 6 items

.env311\Lib\site-packages\anndata\utils.py .                   [ 16%]
.env311\Lib\site-packages\anndata\_core\anndata.py ..          [ 50%]
.env311\Lib\site-packages\anndata\_core\merge.py ..            [ 83%]
src\anndata\tests\test_base.py .                               [100%]

========================= warnings summary ==========================

Here's the diff:

diff --git a/pyproject.toml b/pyproject.toml
index 81389c5..d15f40a 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -21,12 +21,12 @@ dependencies = [
 
 [tool.hatch.build]
 exclude = [
+    'src/anndata/tests/conftest.py',
     'src/anndata/tests/test_*.py',
 ]
 
 [tool.pytest.ini_options]
 addopts = [
-    '--import-mode=importlib',
     '--doctest-modules',
     '--pyargs',
 ]

Follows is a long description of the overall problem.


The introduction of importlib mode was meant to give an alternative to the other modes so that:

  1. pytest does not need change sys.path to import tests and conftest files.
  2. Different test modules can have the same base name, without __init__.py files.

(Full explanation in the docs).

One important thing in that description is *test modules and conftests.

Before we introduced the importlib mode, all modules directly imported by pytest used the same function, import_path, which changes sys.path to import a module in case it is not already importable. "All modules" in the majority means tests and conftests (as user application code is imported indirectly through these), but it also includes user application code in some cases, doctests being one case relevant here.

The importlib mode works by assuming it is importing a module that cannot be reached from sys.path through the normal means, a common example being the tests directory layout at the root of the package, where we have a test file tests/foo/bar/test_foo.py without any __init__.py files within. It is not importable because tests is not reachable from sys.path, and we also do not want to change it.

To import that file, we:

  1. Try to come up with a "module name" for it based on its full path and root, so for the file tests/foo/bar/test_foo.py we make up an non-existing module name "tests.foo.bar.test_foo". This module name will be unique, and not conflict with other test modules named test_foo.py in the same suite, as they will be in a different directory and will be assigned a different module name using the same mechanism ("tests.schema.test_foo" for example).

  2. Import the module using its full path using importlib.importmodule, and then place that module in sys.modules with the module_name we determined above.

This works well when used to import just test modules and conftests, but will create divergences when used import application code. In the reproducer repo, we are trying to import anndata using importlib mode, so it ends up being imported as a module named using the logic described above, however it is reachable through sys.path and we probably should import it using the normal mechanisms.

Probably one solution is to revisit all places where we manually import modules, and possibly use "importlib" (when configured) for tests+conftests or fallback to the other mechanisms for normal modules, but this feels extremely risky given the millions of test suites out there.

The importlib mode was intended to solve avoid changing sys.path/"ImportMismatch" errors, but particularly if I knew all the problems/details it involves, I would just recommend other solutions:

  1. pytest changes sys.path and it is testing the local copy? Switch to src layout.
  2. Test module names with the same? Add __init__.py files in your tests turning them into packages.

But here we are.

I'm not sure how to proceed here. Perhaps we can right away add a warning on --doctest-modules that it might be problematic to use together with --import-mode=importlib.

I will spend some more time trying to see if there's a simple and safe solution to this general mess.

nicoddemus avatar Feb 10 '24 15:02 nicoddemus

Maybe I missed it when reading your summary, but isn’t my PR a simple and safe solution?

It gets rid of this iffy strategy:

Try to come up with a "module name" for it based on its full path and root, so for the file tests/foo/bar/test_foo.py we make up an non-existing module name "tests.foo.bar.test_foo".

whenever some module does live below one of the sys.path entries.

flying-sheep avatar Feb 12 '24 09:02 flying-sheep

I will double check, but @bluetech's concern is valid: search through sys.path entries can be very costly, specially in environment configurations where modules are reachable by configuring PYTHONPATH -- at work we use this, and our sys.path entry will contain dozens of entries.

It does not invalidate it completely of course, but it is something we should consider.

As I said, I will double check it later. 👍

nicoddemus avatar Feb 12 '24 16:02 nicoddemus

I don‘t think it invalidates it in the slightest before someone has measured the impact :wink:

Also keep in mind that I outlined some strategies to make it faster.

I don‘t think we should even consider sacrificing correctness before having tried out all other sensible avenues. Importing modules twice can lead to very subtle and hard to debug breakage, so I feel like everyone waiting 5 seconds more for the test suite would be an amazing trade-off compared to a handful of people debugging some weird double-import bug for 3 hours.

flying-sheep avatar Feb 13 '24 10:02 flying-sheep

I don‘t think we should even consider sacrificing correctness before having tried out all other sensible avenues.

Oh I agree 100%, I didn't mean to imply that we were not applying your fix based solely on any performance problems that we have a gut feeling about -- I am just thinking there might be some other solution.

One such solution that occurred to me, and I plan to test, is that when using importlib, we first try to import the module normally, without altering sys.path. If the module ends up being importable, then we just use it; if not, then we fallback with the current importlib shenaningans. If I'm right, this would probably be the simplest and a good solution still aligned with importlib's goal: not changing sys.path and allow duplicated module names.

nicoddemus avatar Feb 14 '24 18:02 nicoddemus

I plan to spend more time this week on this, and solve this once and for all (with your fix or something else).

nicoddemus avatar Feb 14 '24 18:02 nicoddemus

that when using importlib, we first try to import the module normally, without altering sys.path.

I’m confused, I thought importlib’s point is that it doesn’t alter sys.path?

I plan to spend more time this week on this, and solve this once and for all (with your fix or something else).

Another possibility is to just be able to make PyTest aware of the module root(s) the project has, e.g. src/. Then that would be used for imports instead of the project root directory.

flying-sheep avatar Feb 15 '24 09:02 flying-sheep

I’m confused, I thought importlib’s point is that it doesn’t alter sys.path?

Yep, what I mean is just try to import the module, without changing sys.path: in case we are trying to import a normal module which is installed in the virtualenv, this would just work.

Another possibility is to just be able to make PyTest aware of the module root(s) the project has, e.g. src/. Then that would be used for imports instead of the project root directory.

That would be another solution too.

nicoddemus avatar Feb 15 '24 09:02 nicoddemus

Just to provide another data point, I was intrigued by this comment:

The importlib mode was intended to solve avoid changing sys.path/"ImportMismatch" errors, but particularly if I knew all the problems/details it involves, I would just recommend other solutions:

  1. pytest changes sys.path and it is testing the local copy? Switch to src layout.
  2. Test module names with the same? Add __init__.py files in your tests turning them into packages.

For what it's worth, I hadn't realized that importlib was no longer being recommended by default. I had been using --import-mode=importlib because I thought it was the recommended mode going forward, but it sounds like that's no longer the case. It looks like the Good Integration Practices documentation page still (as of 8.0.x) states that:

For new projects, we recommend to use importlib import mode (see which-import-mode for a detailed explanation). To this end, add the following to your pyproject.toml:

Is that no longer the best recommendation, even for new projects?

For what it's worth, simply removing --import-mode=importlib seems to fix the problem for me, though there was one little weirdness that I thought I'd mention here, related to the pythonpath setting and whether we are or are not using editable installs.

--import-mode pythonpath --doctest-modules editable install non-doctests pass?
prepend (omitted) no no no
prepend (omitted) no yes yes
prepend (omitted) yes no yes
prepend (omitted) yes yes yes
prepend src no no yes
prepend src no yes yes
prepend src yes no yes
prepend src yes yes yes
importlib (omitted) no no no
importlib (omitted) no yes yes
importlib (omitted) yes no no
importlib (omitted) yes yes yes
importlib src no no yes
importlib src no yes yes
importlib src yes no no
importlib src yes yes yes

What I mean by the last column, "non-doctests pass?", is that when I write a "no", it means that the non-doctest tests fail with import errors.

Edit: The version of pytest that I am currently using is 7.3.1, so the data in the table reflects that version.

To summarize the table, it looks like editable and non-editable installs behave very differently depending on the values of --import-mode, pythonpath, and presence of --doctest-modules. For some reason, when in prepend mode, if you don't set pythonpath to src and if you don't pass in --doctest-modules, then all the non-doctest tests will fail in editable installs, but if pass in --doctest-modules without setting src, then the non-doctest tests will pass.

Does the presence of --doctest-modules cause you to not need to specify pythonpath = "src" for some reason?

I'm asking because I'm managing multiple Python projects, some with doctests and others without any doctests, and I'd like to keep their pytest configurations as close as possible, ideally only passing in --doctest-modules in the projects that actually have doctests written. I found it strange that in projects without doctests, going back to prepend mode caused the non-doctest tests to all fail, but in projects with doctests, they ran fine, and I determined that the --doctest-modules option was affecting how non-doctest tests were being imported.

For my use case, I think I can standardize on prepend mode, explicitly setting pythonpath = src, and optionally adding or omitting --doctest-modules depending on the project, and it seems to work and have the expected behavior for both editable and non-editable installs.

BTW, thanks for the really detailed discussion in this ticket. I've been following this ticket mostly as an observer, and it's been helping me understand how the various settings interact with each other.

richardxia avatar Feb 15 '24 19:02 richardxia

I don’t want to use any sys.path-modifying method. It’s much cleaner to have PyTest’s collection deal with importing and running tests, and leave sys.path for the actual library under test.

flying-sheep avatar Feb 16 '24 08:02 flying-sheep

Proposing a definite solution (IMHO) in #11997. 👍

nicoddemus avatar Feb 17 '24 20:02 nicoddemus

Sadly the changes you made to the PR after I checked with my project broke it again. I should have checked again …

Somehow if running with --doctest-modules and --import-mode=importlib, now (some of the?) modules and classes imported both during doctests and unit tests have the same qualified names and import paths, but are not the same object.

So e.g.

  • pickling fails, since the class in the tests isn‘t identical to a pickled object’s class (same name same code, different object id I guess)
  • warning filters fail, since they’re installed for one version of the warning class and not the other,
  • imports fail, because one version of the import doesn’t have a field while another has

you can see some of the issues here: https://dev.azure.com/scverse/anndata/_build/results?buildId=5905&view=logs&j=5ea502cf-d418-510c-3b5f-c4ba606ae534

But yeah, I’m pretty sure that the root cause is that somehow things end up being re-imported that should be in the module cache.

Any idea what could be causing that?

I’m happy to work on a fix myself, but it would be great to have a pointer.

flying-sheep avatar Mar 04 '24 09:03 flying-sheep

I haven't looked at your failures but IIRC you are using namespace packages, did you set the new consider_namespace_packages = true ini setting?

bluetech avatar Mar 04 '24 10:03 bluetech

Argh, really sorry to hear that @flying-sheep. I did test the fix in your reproducible example before merging, but seems there is still something to adjust.

Looking at the code I suspect there might be a bug there indeed, seems we should be checking if module_name already is in sys.modules before attempting to import it here:

https://github.com/pytest-dev/pytest/blob/71849cc05c4fffe2267a6844393be3adb8248820/src/_pytest/pathlib.py#L542-L544

Like it is being done here:

https://github.com/pytest-dev/pytest/blob/71849cc05c4fffe2267a6844393be3adb8248820/src/_pytest/pathlib.py#L550-L552

I will check this tonight.

Sorry again for the breakage. :confused:

nicoddemus avatar Mar 04 '24 11:03 nicoddemus

I’m super grateful that you’re working on getting this resolved.

I’m certain once this bug is fixed, this will make using pytest a blast in all kinds of (actual and not-so-)corner cases.

I’ll test the fix with our projects, which should be complex-enough real-world examples to prove that it works in a lot of cases.

E.g. one easy reproducer is

git clone https://github.com/scverse/scanpy.git
cd scanpy
git checkout 14555ba48537995acaa381b8b6ad5fc41e612510  # current main
python -m venv .venv
.venv/bin/pip install .[test] pytest==8.1.0
.venv/bin/pytest scanpy/_utils/compute/is_constant.py

flying-sheep avatar Mar 04 '24 11:03 flying-sheep

Note that the documentation of this feature mentions that the default value of consider_namespace_packages is False. However, if import_path is called without consider_namespace_packages, an exception is raised. Shouldn't the documentation be updated or was the it forgotten to specify defaults in the public api?

twmr avatar Mar 11 '24 20:03 twmr

The documentation is talking about the consider_namespace_packages option, not the parameter to import_path, which is not part of the public API btw.

nicoddemus avatar Mar 11 '24 21:03 nicoddemus