pytest collection appears to stall/slow down/jam up when some third-party libraries are used; add function to ignore specific modules
pytest 8.3.2
Python 3.11
Windows 10
Hi all!
I was tossing up whether this should be a bug report or a feature request, as I wasn't able to work out whether the following is expected behaviour in the documention. It took a good day of solid troubleshooting to figure this one out, which I have found a workaround, so it's not a showstopper though it was difficult and very confusing to troubleshoot.
What's the problem?
pytest will, perhaps by design?, scan through and 'collect' (some, all?) third-party libraries used in x function, when x function iself is imported into a test to run.
This gives the impression that pytest:
- collection has stalled when collecting tests in a large project,
- flat out doesn't work properly when only one test exists,
- is really frustrating to use with PyCharm (and other automated-test-runner tools) when it takes upwards of 20 seconds to collect tests before each test runs.
Current behaviour (as of pytest 8.3.2)
Note: I've used the awesome pytest-richtrace library (in --verbose mode) to help me figure this issue out, as there didn't appear to be a similar function in pytest to make pytests collection activities verbose.
The library I can reliably reproduce this issue with is arcgis. https://pypi.org/project/arcgis/
arcgis is Esri's ArcGIS API for Python, allowing Python code to interact with Esri's Enterprise and Online geospatial systems without needing to write a mess of boilerplate REST API code. arcgis is a package I don't maintain, and don't need to test directly.
The same behaviour is present in both PyCharm and by manually invoking pytest with python -m via command line.
Example
# app.py
import os
from arcgis import GIS # Connection to an server Portal instance
def connect():
portal = GIS(url=connection_url(domain, context), username=os.getenv('USERNAME'), username=os.getenv('PASSWORD'))
return portal
def connection_url(domain, context):
if domain == 'arcgis.com'
return None # API defaults to arcgis.com if None used as parameter
else:
return(f"https://{domain}/{context}")
# common.py
def create_wigwam():
# Do stuff here.
return True
Tests
# tests/test_app.py
from app import connection_url
def test_connection_url():
pass
# tests/test_common.py
# another random test unrelated to app.py in the same directory
from common import create_wigwam
def test_create_wigwam():
assert True
Note that:
connect()is not called,- there isn't a test for
connect(), connect()isn't imported intotest_app.py,- the
connection_url()function does not callconnect(), and therefore callarcgis.GIS, and test_app.pyessentially does nothing other than import connection_url from app.py
I have also explicitly excluded venv and site-packages as directories in pytest.ini.
The code above, as it is written right now, will see pytest traverse the arcgis package within the virtual environment that runs this code. According to pytest-richtrace, pytest doesn't appear to collect anything in that package or the venv directory. pytest seems to ignore the os package.
Using pytest-richtrace I saw the following behaviour:
hook: pytest_collection
session: <Session exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=0>
...
hook: pytest_collectstart tests/test_app.py
INFO:numexpr.utils:Note: NumExpr detected 20 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. # This is calling into the virtual environment.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
# Stall here after the above line is printed to console for a period of time, in my case at least 20 seconds, no other feedback is given.
hook: pytest_itemcollected tests/test_app.py::test_connection_url
hook: pytest_collection_modifyitems
hook: pytest_collection_finish
Simply commenting out the import line in test_app.py --
from app import connection_url
-- prevents pytest traversing into arcgis to collect. All other tests complete pretty much instantaneously.
I haven't checked whether arcgis has any tests though the collection process certainly doesn't pick any up.
Describe the solution you'd like
- Is this expected behaviour?
- If expected behaviour:
- Add a command line or configuration flag that tells
pytestto exclude certain libraries from collection. (excluding directories doesn't do this) - Make the collection function more verbose with the
--verboseflag, so that it's much easier to troubleshoot collection issues:- Print out the directory and file being examined for collection
- Include timings when the
--durations=0flag is called
- Add a command line or configuration flag that tells
Workaround solution
Using MagicMock in unittest.mock allows pytest to traverse into app.py without also traversing into arcgis, stopping the stalling issue without having to comment out code or refactor unnecessarily:
# test_app.py
from unittest.mock import MagicMock
sys.modules['arcgis'] = MagicMock()
# No other mocking code is needed, as this MagicMock completely substitutes arcgis when under test.
Many thanks!
Based on the provided information it seems like importing the Library is unreasonably expensive
Please validate if importing lazyly removes the stall
Hi @RonnyPfannschmidt,
Lazy loading certainly appears to bypass the stall (with mocking code commented out in the test):
# app.py
import os
# from arcgis import GIS # Moved from here to connect()
def connect():
from arcgis import GIS # Lazily load GIS from arcgis
portal = GIS(url=connection_url(domain, context), username=os.getenv('USERNAME'), username=os.getenv('PASSWORD'))
return portal
def connection_url(domain, context):
if domain == 'arcgis.com'
return None # API defaults to arcgis.com if None used as parameter
else:
return(f"https://{domain}/{context}")
pytest-richtrace doesn't show pytest traversing into arcgis.
Unfortunately this is not something Pytest can solve; it's just that importing a module executes all the top-level code and import arcgis is unreasonably slow.
Some kind of lazy imports might help in your situation but Pytest can't solve it for you.