Add `posixpath.isreserved()`
Feature or enhancement
Proposal:
These paths would be considered reserved:
- Embedded null
- ~Too long file name:
"a" * 256~ - ~Too long path:
"a/" * 512~ /proc/**/dev/**
Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
- https://discuss.python.org/t/deprecation-of-pathlib-purepath-is-reserved/53203
I don't like introducing such an API. It would alway be inaccurate, and it will be a maintenance burden trying to keep it up to date. As someone commented in the linked discussion, invalid paths will always raise an error anyway. I'm strongly -1 to this.
What's inaccurate here? Lengths?
Neither the now-deprecated pathlib.PurePath.is_reserved, nor the new ntpath.isreserved that this is supposed to match, define what "reserved" means. Can't create? Without that, this is essentially impossible to discuss - it seems like some kind of nebulous advisory information, but to whom?
According to POSIX, portable pathnames cannot begin with a dash.
What's inaccurate here? Lengths?
NAME_MAX and PATH_MAX are implementation-defined, fwiw. (except that they cannot be shorter than 14 and 256, respectively).
Alright, just the rest then. If it's too long, you'll get an error anyway. 14 & 256 are too restrictive.
Again; adding an API with known inconsistencies is a really bad idea.
We don't check the length for Windows either... If you mean something else, please clarify.
According to POSIX, portable pathnames cannot begin with a dash.
They absolutely can! It's just a warning that you should consider not doing so because it could be misinterpreted as an option flag if the filename is passed to a program.
The portable filename character set does limit filenames to upper and lowercase ascii letters, numbers, and ._-, though.
Calling the function "isreserved" seems incredibly wrong anyway. What does "reserved" mean, especially if the function is only advisory?
Deprecate all versions of the function anywhere, and add a new function called os.portable_filename() which does exactly what it says: check if the filename is "generically portable". Maybe add a flavor='posix' or flavor='windows' argument.
On Linux, two directories can use two different file systems, and each file system has its own "reserved" characters. What if you mount a NTFS partition in /mnt/win and check if /mnt/win/NUL is reserved or not? You have to actually create the file to check if it's valid or not. Checking the filesystem for each part of the path sounds unsafe and inefficient. Everything is dynamic on Linux, you can umount/remount a partition while isreserved() is being called.
The check wouldn't do OS calls, just like on Windows. So, you would be be able to use it on abstract paths too.
Continuing the discussion from the Discourse thread a bit - /proc and /dev are not analogous to nul, con, etc. on Windows. Those filenames appear to exist in any directory:
C:\Users\Administrator>echo hello > con
hello
C:\Users\Administrator>echo hello > C:\con
hello
C:\Users\Administrator>echo hello > C:\Windows\System32\con
hello
That is, this is processed at a higher level than the filesystem - before even going to the filesystem, these names are special, and that is what makes them "reserved". On UNIX, /dev/null, /dev/tty, etc. are actually files in the filesystem. An administrator can delete and recreate them, for instance; you can't delete the special reserved names on Windows because they aren't really files. The special behavior on UNIX is in the file itself, not in the naming.
Along those lines, you don't need to have the files in /dev and /proc - you can put them somewhere else, or maybe they don't exist. Those names are just a convention. On Windows those specific names are part of the file-name-resolving code.
In fact, Linux has a feature called user namespaces which means that even a non-root user can put themselves in a setup where you can create device nodes and mount/unmount filesystems, so you can try this out yourself on many Linux distros even if you don't have root:
>>> import os
>>> os.unshare(os.CLONE_NEWUSER | os.CLONE_NEWNS)
>>> os.chroot(".")
>>> os.mkdir("/dev")
>>> os.mkdir("/proc")
>>> null = open("/dev/null", "w")
>>> null.write("hi")
2
>>> null.close()
>>> null = open("/dev/null", "r")
>>> null.read()
'hi'
These paths work like any other path.
Thinking about the use case of extracting files - because of the behavior that the Windows files exist in every directory, it makes sense to have a function (somewhere, maybe in the standard library, maybe elsewhere) to look at filenames and say, this name is going to do something weird. But on UNIX, if you're in a directory you control, any filename in that directory is going to be safe. (At worst you'll get an error about the file not working—even on NTFS mounted on UNIX, the reserved names don't actually do what they'd do on Windows, they're just invalid.) And in either Windows or UNIX, if you're in a directory you don't control - including accepting absolute paths from the user - you need to be very careful about filenames in general, whether or not those are special files. Having untrusted users create files in C:\Windows\System32 or /usr/bin is quite bad, even though those are normal directories.
So I don't think that there should be any paths that posixpath considers reserved. Let me know if this explanation makes sense—I do think it would be worth including some explanation of why in the documentation of os.path.isreserved().
@geofft
This is exactly why I believe that calling anything "reserved" doesn't make sense and what is actually conceptually desired is "portable_filename()". This would allow people who want to create files on either OS, have a reasonable guarantee that the filenames won't be broken when copying them to another OS / filesystem.
Aside: . and .. are just as special as nul/con, I guess. :) Even on Unix.
Ah, yes, returning true for . and ..would make sense I think!
Well, /proc/** is also special cased. It doesn't contain real files. But /dev/** does, so let's not include that.
There does not seem to be much support for this. No core dev has spoken out to explicitly support the idea; several core devs and community members have expressed being opposed to the idea.
I suggest to close this as won't implement.
It was mentioned that this could have a meaningful implementation for posix. But what is reserved then? Posix doesn't really reserve anything.
Exactly, thus there is no reason to implement this. There is no clear goal, there is no clear spec; this all makes for a bad API.
It was mentioned that this could have a meaningful implementation for posix. But what is reserved then? Posix doesn't really reserve anything.
Like I said, the only question to ask is whether a filename is portable, not whether it is reserved.
It would require a new function name, not just "make isreserved work on non-Windows platforms". It's a new feature, not a bugfix (even a consistency bugfix, which some people call a feature).