cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Pathlib.iterdir semantics change dramatically under Python 3.13

Open jaraco opened this issue 10 months ago • 11 comments

Bug report

Bug description:

This code behaves very differently on Python 3.13 than 3.12:

(p for p in pathlib.Path('does-not-exist').iterdir())
 🐚 py -3.12 -c "import pathlib; (p for p in pathlib.Path('does-not-exist').iterdir())"
 🐚 py -3.13 -c "import pathlib; (p for p in pathlib.Path('does-not-exist').iterdir())"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    import pathlib; (p for p in pathlib.Path('does-not-exist').iterdir())
                                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/opt/homebrew/Cellar/[email protected]/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/pathlib/_local.py", line 575, in iterdir
    with os.scandir(root_dir) as scandir_it:
         ~~~~~~~~~~^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'does-not-exist'

In Python 3.12, the generator expression was evaluated lazily, not invoking path.iterdir() until the generator was consumed. On 3.13, at least a portion of the generator is evaluated, triggering the exception when the dir does not exist.

I discovered this issue in https://github.com/jaraco/jaraco.develop/issues/26.

I looked at What's New for Python 3.13, and there's only one mention of generator expressions regarding mutation of locals in generator expressions, which doesn't seem to be relevant here.

That change mentions https://github.com/python/cpython/issues/74929, so maybe that change is also implicated in the change in execution order.

CPython versions tested on:

3.13

Operating systems tested on:

macOS

jaraco avatar Feb 08 '25 22:02 jaraco

Silly me. I just realized this issue probably isn't about a change to the generator expression but a change to pathlib.

jaraco avatar Feb 08 '25 22:02 jaraco

Indeed, the generator expression is not relevant:

 🐚 py -3.12 -c "import pathlib; pathlib.Path('does-not-exist').iterdir()"
 🐚 py -3.13 -c "import pathlib; pathlib.Path('does-not-exist').iterdir()"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    import pathlib; pathlib.Path('does-not-exist').iterdir()
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/opt/homebrew/Cellar/[email protected]/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/pathlib/_local.py", line 575, in iterdir
    with os.scandir(root_dir) as scandir_it:
         ~~~~~~~~~~^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'does-not-exist'

jaraco avatar Feb 08 '25 22:02 jaraco

Bisected to https://github.com/python/cpython/pull/107320 so confirming that it's related to iterdir() and not the generator itself.

tomasr8 avatar Feb 08 '25 22:02 tomasr8

I was even involved in that issue and had completely forgotten.

So it was an intentional change, even though it doesn't appear in the "what's new". At the very least, we should add a note to "what's new", as this behavior might be surprising.

I'm also thinking we should revisit our assumptions from #78722 in light of this new experience. It demonstrates that the change in behavior isn't necessarily beneficial and there may be other use-cases out there that are relying on the late evaluation of the error condition.

I don't feel strongly about it, and I'm happy to proceed with this new behavior, but we should ask ourselves if a rollback is warranted to retain consistency across Python versions.

jaraco avatar Feb 08 '25 23:02 jaraco

I'll get a PR up for adding it to the whatsnew, and perhaps to the Path.iterdir() docs

I still prefer the new behaviour - it meets most users expectations better, it makes exception handling much more straightforward, and it matches how os.scandir() raises exceptions too.

barneygale avatar Feb 09 '25 13:02 barneygale

This is indeed a big change. Some libs such as anyio are relying on the fact that iterdir() doesn't do blocking syscalls and defer to a thread only when it's iterated.

cbornet avatar Feb 20 '25 12:02 cbornet

Maybe we should revert the iterdir() change then, despite it being a little awkward to handle exceptions previously.

barneygale avatar Feb 21 '25 19:02 barneygale

I'm so conflicted on this, I can't even advocate for one approach over the other. I'm inclined to say we should raise the issue with a larger group. I do think it would be worthwhile to make a decision quickly to reduce the duration of exposure. Maybe just a post in Core Dev discuss? I'm happy to defer to your judgment Barney, but let me know if you'd like my help garnering a wider consensus.

jaraco avatar Feb 27 '25 00:02 jaraco

Relevant:

  • #136059

nh2 avatar Jun 28 '25 08:06 nh2

I think I would expect an iterator to lazily surface exceptions as it does work. If a user wants to surface them sooner they can always wrap the result in a tuple or list themselves to drive the iterator.

Given that this change broke anyio, I think it should be reverted. Since there is a not-too-complicated alternative (wrap Path.iterdir() in list() if you want eager exceptions to be raised), the motivation does not seem to justify breaking behavior of downstream users to me.

emmatyping avatar Jun 28 '25 20:06 emmatyping

I think that raising an exception for non-existing directory (and for non-directory) is right. If you want to make this exception lazy, just wrap the creation of the iterator in a generator function:

def path_iterdir(p):
    yield from p.iterdir()

serhiy-storchaka avatar Jun 29 '25 12:06 serhiy-storchaka