GH-125413: Add `pathlib.Path.dir_entry` attribute
Add a Path.dir_entry attribute. In any path object generated by Path.iterdir(), it stores an os.DirEntry object corresponding to the path; in other cases it is None.
This can be used to retrieve the file type and attributes of directory children without necessarily incurring further system calls.
Under the hood, we use dir_entry in our implementations of PathBase.glob(), PathBase.walk() and PathBase.copy(), the last of which also provides the implementation of Path.copy(), resulting in a modest speedup when copying local directory trees.
- Issue: gh-125413
📚 Documentation preview 📚: https://cpython-previews--125419.org.readthedocs.build/
Copying is a little faster:
$ ./python -m timeit -s "from pathlib import Path" "Path('Doc').copy('Doc2', dirs_exist_ok=True, preserve_metadata=True)"
5 loops, best of 5: 70.7 msec per loop # before
5 loops, best of 5: 68.7 msec per loop # after
When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.
I played around with that idea, and I haven't completely ruled it out, but it's a bit of a rabbit hole.
On naming and re-using DirEntry: I don't think os.DirEntry.from_path() makes sense. The purpose of DirEntry is that it stores information from calling os.scandir() on the parent directory. I think we'd need a new class with name, is_dir() and is_symlink() attributes. We'd lazily generate an instance of this class from Path.last_status (or .status, or soemthing), assuming there's not already a DirEntry stored. The new class could be called pathlib.PathStatus or something along those lines.
Then we need to define when os.stat() is called and when exceptions are raised. A DirEntry object is initially populated with some information from the os.scandir() call, so we might want our PathStatus object to perform a stat() on creation. But should it os.stat() or os.lstat()? And doesn't that imply that our Path attribute should be a method rather than a property, given it may perform serious work? Maybe Path.cached_status()?
Then we need to figure out how this interacts with the rest of the Path methods. Should Path.stat() and Path.lstat() automatically update the status object? Should it replace an existing DirEntry object with a PathStatus object? Should Path.is_dir() call self.stat(); return self.cached_status().is_dir()?
None of this is insurmountable, mind :)
Perhaps I'm overthinking this, and all we really need is a Path.scandir() method
Once you've decided on whether to continue on this work or not, please ping me again (sorry, I missed this one)