sphinx [BUG] Tests have global side-effects.

Describe the bug

While implementing some feature, I wanted to test the latter by selectively running some tests but I found that the tests are not executed in an isolated context, namely they have undesirable side-effects.

A simple way to see that is by manually running the following tests:

python -m tox -e py310 \
  -- -s --no-header --no-summary \
  tests/test_ext_autodoc.py::test_autodoc_typed_instance_variables

python -m tox -e py310 \
  -- -s --no-header --no-summary \
  tests/test_ext_autodoc_autoattribute.py::test_autoattribute_typed_variable_in_alias \
  tests/test_ext_autodoc.py::test_autodoc_typed_instance_variables

The first test succeeds but the second one fails with the following assertion error:

AssertionError: assert ['', '.. py:m...ed_vars', ...] == ['', '.. py:m...ed_vars', ...]
  At index 62 diff: '   .. py:attribute:: Derived.attr2' != '   .. py:attribute::   Derived.attr7'
  Left contains 31 more items, first extra item: '      :module: target.typed_vars'
  Use -v to get more diff

The reason is that, in the first test, the documenter being used is a ModuleDocumenter for tests/roots/test-ext-autodoc/target/typed_vars.py. In particular, when the documenter documents its members, the annotations of the Alias class is NOT updated using the annotations of the Derived and Class class. By adding

if self.fullname == 'target.typed_vars.Alias.attr2':
    logger.warning([self.fullname, self.parent.__annotations__])
elif self.fullname == 'target.typed_vars.Alias':
    logger.warning([self.object, self.object.__annotations__])

after

https://github.com/sphinx-doc/sphinx/blob/669bcc0a190ce9921451bad28431886fbaad170b/sphinx/ext/autodoc/init.py#L877-L880

then the output of the first command is:

# warning:
WARNING: [<class 'target.typed_vars.Derived'>, {'attr7': <class 'int'>}]

However, if we do the same thing when running the second command, we get the following output:

# warning: (this is for test_autoattribute_typed_variable_in_alias)
WARNING: ['target.typed_vars.Alias.attr2', {'attr7': <class 'int'>, 'attr1': 'int', 'attr2': 'int', 'attr3': 'int', 'descr4': 'int', 'attr4': 'int', 'attr5': 'int', 'attr6': 'int'}]

# warning: (this is for test_autodoc_typed_instance_variables)
WARNING: [<class 'target.typed_vars.Derived'>, {'attr7': <class 'int'>, 'attr1': 'int', 'attr2': 'int', 'attr3': 'int', 'descr4': 'int', 'attr4': 'int', 'attr5': 'int', 'attr6': 'int'}]

In other words test_autoattribute_typed_variable_in_alias is polluting the global context of test_autodoc_typed_instance_variables ! The reason is that the AttributeDocumenter used to test test_autoattribute_typed_variable_in_alias fires autodoc events which cause the annotations of the parent object (namely Alias) to be updated. Since Alias is an alias, they will propagate to Derived, and this is why when documenting Derived in test_autodoc_typed_instance_variables by a ClassDocumenter, we end up having more annotations than really needed.

Other side-effects can be seen when simply running this following command:

python -m tox -e py310 -- tests/test_ext_autodoc*

I think that we need to make sure that each test does not collide with another one and not assume that such test is executed after or before another one. This can be extremely helpful when we simply want to run separate tests (perhaps this can actually be done easily when setting up pytest?).

How to Reproduce

Compare the output of

python -m tox -e py310 \
  -- -s --no-header --no-summary \
  tests/test_ext_autodoc.py::test_autodoc_typed_instance_variables

with the output of

python -m tox -e py310 \
  -- -s --no-header --no-summary \
  tests/test_ext_autodoc_autoattribute.py::test_autoattribute_typed_variable_in_alias \
  tests/test_ext_autodoc.py::test_autodoc_typed_instance_variables

Environment Information

Platform:              linux; (Linux-5.3.18-lp152.106-default-x86_64-with-glibc2.26)
Python version:        3.10.3 (main, Jan 31 2023, 10:47:25) [GCC 7.5.0])
Python implementation: CPython
Sphinx version:        6.2.0+/e1b15a5c1
Docutils version:      0.18.1
Jinja2 version:        3.1.2
Pygments version:      2.14.0

Sphinx extensions

No response

Additional context

No response

Apr 04 '23 11:04 picnixz

In sphinx.testing.fixtures:

-     kwargs['srcdir'] = srcdir = sphinx_test_tempdir / kwargs.get('srcdir', testroot)
+     kwargs['srcdir'] = srcdir = util.path(tmp_path) / kwargs.get('srcdir', testroot)

using Pytest's tmp_path fixture. This fixes most errors I get when running with --random-order, but comes at a significant cost -- test run time goes from 110 seconds to 405 seconds, a 3.7x slowdown.

A

Apr 07 '23 00:04 AA-Turner

That's a pretty significant slow-down I would say. Next week, I can finish fixes for --random-order and we can enable both (parallel xdist and --random-order) in CI (if you want?). Note that right now, I can finish tests on my Ryzen 9 machine with the following results:

$ pytest -n auto
...
=== 1850 passed, 2 skipped, 5 warnings in 15.39s ===

Apr 07 '23 07:04 marxin

I am moving my reply to https://github.com/sphinx-doc/sphinx/pull/11293#issuecomment-1499606940 here:

perhaps we make any state that is needed explicit via module-level setup or fixtures etc?

I have the following suggestions in that direction:

For all tests that have global side-effects on a Python module itself or its contents (e.g., autodoc-related tests), we should be able to reload the documented modules (which are usually lightweight since they are only used for tests). I assume that those modules are essentially without global variables such as cached. It should be sufficient to get a fresh state of those modules. Alternatively, we write the code of the module being documented in some temporary file each time (this could ensure that the state of the module is kept).
For IO-related side-effects, I think we need to use a fresh build environment or define a set of files that are expected at the beginning of the test and remove those if they are older than the time when the test is executed. That way, we could keep old files (or directories) that are not needed unless they are specifically expected by the test.
If the test still fails with the previous guards, we can explicitly request a fresh execution as if it was entirely isolated but this should be the last resort.

Apr 07 '23 10:04 picnixz

Just a few comments about different levels of independence when it comes to tests:

pytest -n auto --dist=loadfile - uses python-xdist where loadfile basically means that a separate environment is reused for a file; that's something that achieves a significant speed-up and should work even now
pytest -n auto - the default dispatching mechanism feeds workers as they are idle and there are still tests that do fail right now
pytest --random-order - runs tests sequentially in a random order and most of the PR I linked to this issue are trying to address the problem; right now, there are quite many problems
pytest --random-order --count=X (with X>1) means that each test is run multiple times and it's even harder to make a proper clean-up for some of the tests

I tried to enable python -n auto in GitHub CI but haven't succeeded for some reason: https://github.com/marxin/sphinx/tree/enable-pytest-xdist

Feel free to comment my thoughts and what are realistic goals we can have?

Apr 13 '23 13:04 marxin

I was trying to fix #10193 and when I wanted to only run the autodoc test suite. HOWEVER, I observed that tests actually FAIL because of side-effects way more than what I expected. I also tried using --random-order and --random-order-bucket=global with pytest-random-order plugin to see whether this is really important and not, and yes, it's non-negligible.

As such, the next I'll be focusing is fixing the test suite. I'll try the following:

I'll make it as slow as possible by isolating every single test. Because if a test does not properly work, or if side-effects are actually unexpected for us (like, currently, we might have side-effects that make the test succeed!)
Then, I'll try isolating the parts that need to be isolated. Some things don't need to be isolated but most of the autodoc component must be isolated because it's changing the state of modules.

Feb 03 '24 18:02 picnixz

So, I found a possible side-effect culprit when implementing #11964. I actually put my modules at the same level as the conf.py file because I didn't want to bother with a package. However ! it appears that it caused some weird side-effect. I verified locally by running all tests in tests/test_extensions vs only those in tests/test_extensions/test_ext_napoleon_docstring.py. The second case runs correctly but one of the test in tests/test_extensions (probably autodoc related) is polluting the global namespace.

I'll continue investigating but I'll first try to fix #11941.

EDIT: I fixed #11941, so I'll start investigating this one tomorrow.

Feb 10 '24 11:02 picnixz

I'm facing the same issue that autodoc-related tests succeed if run alone in isolation but fail as soon as multiple tests are run simultaneously. I attempted to debug the issue for several hours with no success.

The error only occurs when multiple tests document the same python module/class. When each test operates on its individual python test data/module, everything is fine.

Mar 12 '24 14:03 mansenfranzen

I am working in a reimplementation and I haven't pushed the new changes yet.

I think what I did should be fine so please wait for a few weeks time until I have more time!

Mar 14 '24 08:03 picnixz

sphinx sphinx copied to clipboard

[BUG] Tests have global side-effects.

Describe the bug

How to Reproduce

Environment Information

Sphinx extensions

Additional context

sphinx
sphinx copied to clipboard