pytest icon indicating copy to clipboard operation
pytest copied to clipboard

Fix collection of short paths on Windows

Open nicoddemus opened this issue 1 year ago • 8 comments

Passing a short path in the command line was causing the matchparts check to fail, because Path(short_path) != Path(long_path).

Using os.path.samefile as fallback ensures the comparison works on Windows when comparing short/long paths.

Fix #11895

nicoddemus avatar Feb 05 '24 21:02 nicoddemus

The initial commit was used just to demonstrate the failing test on CI, however we should squash/merge this in the end.

nicoddemus avatar Feb 05 '24 22:02 nicoddemus

Each call to samefile adds 2 stats, and stats are generally reputed to cause slowness. I wonder then, is it possible in Python to "expand" a short path to a long path, then we can run it on the inputs and proceed as usual, instead of using samefile?

bluetech avatar Feb 06 '24 12:02 bluetech

I wonder then, is it possible in Python to "expand" a short path to a long path, then we can run it on the inputs and proceed as usual, instead of using samefile?

Should be possible, I will look into it, however I think it will change the resulting node ids.

nicoddemus avatar Feb 06 '24 12:02 nicoddemus

Done, it was a bit more code but definitely won't impact performance.

nicoddemus avatar Feb 09 '24 21:02 nicoddemus

@bluetech just stumbled into this:

https://github.com/pytest-dev/pytest/blob/7690a0ddf15785f0af0e9a4cf8524379c53cd061/src/_pytest/pathlib.py#L595-L605

This suggests we might be not considering other cases if we try to just expand the short names during collection. Perhaps we should bite the bullet and just go with the stat solution? Less code and will be the safest one, while I understand there's the performance concern.

nicoddemus avatar Feb 10 '24 14:02 nicoddemus

@bluetech what do you think regarding my last point?

nicoddemus avatar Feb 14 '24 18:02 nicoddemus

@nicoddemus I'll try to make an informed review, but if I take too long (like before 8.0.2) then please don't wait for me, I don't mean to block this.

bluetech avatar Feb 16 '24 10:02 bluetech

I used https://github.com/nicoddemus/collection-test-bed to measure the performance so we can have some numbers to decide.

Using that configuration we have 10.5k tests 4 levels deep, with test files containing simple parametrized tests, here are the numbers in my Windows using pytest --co:

branch timing
main 8.71s
stat solution 9.23s
convert to long solution 8.70s

So indeed the stat call has some performance penalty, as expected. Keep in mind however that this is for a test suite which does not import anything else, it only imports pytest itself.

Testing in a more realist test suite from a project I have around with 1.3k tests, I couldn't measure any difference between the stat aproach and main, both clocking around the same 6.6s for the full collection.

So while we can see some difference in a synthetic test suite, with 10k tests but without any other imports, using a realistic test suite, with 1.3k tests and which imports application code, the difference is not perceptible at all.

I'm leaning towards the stat solution as it is simpler and also more correct in my opinion.

nicoddemus avatar Feb 17 '24 14:02 nicoddemus

Going ahead and merge this then, we can revisit later if needed. :+1:

nicoddemus avatar Feb 23 '24 10:02 nicoddemus