jdk
jdk copied to clipboard
8298318: (fs) APIs for handling filename extensions
Add to java.nio.file.Path
a method getExtension
to retrieve the Path
's extension, and companion methods removeExtension
and addExtension
.
Progress
- [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
- [ ] Change requires a CSR request matching fixVersion 23 to be approved (needs to be created)
Issue
- JDK-8298318: (fs) APIs for handling filename extensions (Enhancement - P4)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16226/head:pull/16226
$ git checkout pull/16226
Update a local copy of the PR:
$ git checkout pull/16226
$ git pull https://git.openjdk.org/jdk.git pull/16226/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16226
View PR using the GUI difftool:
$ git pr show -t 16226
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16226.diff
Webrev
:wave: Welcome back bpb! A progress list of the required criteria for merging this PR into master
will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
This request follows from previous work outlined in this nio-dev thread.
For a given Path
p
, the following invariant obtains:
assert p.equals(p.removeExtension().addExtension(p.getExtension()));
Replacing an extension may be accomplished by composing the removal and addition methods as:
jshell> Path losslessImage = Path.of("photo.png");
losslessImage ==> photo.png
jshell> Path lossyImage = losslessImage.removeExtension().addExtension("jpg");
lossyImage ==> photo.jpg
@bplb The following label will be automatically applied to this pull request:
-
nio
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.
Webrevs
- 08: Full - Incremental (b27cf346)
- 07: Full - Incremental (8abc295f)
- 06: Full - Incremental (f66373e5)
- 05: Full - Incremental (4a027a0d)
- 04: Full - Incremental (030a250a)
- 03: Full - Incremental (caeb1f14)
- 02: Full - Incremental (b7f00bb3)
- 01: Full - Incremental (5a926bdc)
- 00: Full (b34c9378)
The UNIX-specific implementation is intentionally omitted for now.
/csr
A CSR will be created once there is reasonable consensus on the verbiage in this conversation.
@bplb has indicated that a compatibility and specification (CSR) request is needed for this pull request.
@bplb please create a CSR request for issue JDK-8298318 with the correct fix version. This pull request cannot be integrated until the CSR request is approved.
I'm too busy right now to spend time on this one but from a quick glance, the proposed addExtension and removeExtension looks very strange. I think it would be useful to show some examples using them to justify why are needed. Also write out some candidate method names. A Path is immutable so I don't think you want methods with names like addXXX or removeXXX as the resulting usages will look like they mutate the Path.
There are a few builder style APIs for immutable objects that have withXXX
methods that return an instance with the change requested. A withExtension(String ext)
could handle the common cases for removing, replacing, and adding an extension.
Path p = Path.of("photo.png");
Path base = p.withExtension("") -> "photo" // replaces any extension with the empty extension (and no dot)
p.withExtension("jpg") -> "photo.jpg" // replaces the extension
base.withExtension("pdf") -> "photo.pdf" // adds the extension
Removing an extension with ""
may be a bit unusual, perhaps an additional method would aid discoverablitybasepath()
or dropExtension
, or trimExtension
.
There are a few builder style APIs for immutable objects that have
withXXX
methods that return an instance with the change requested.
Some time back I thought of but never proposed a method similar to withExtension(String)
called modifyExtension(String parameter)
with behavior
| path | gus | "" | null |
|---------|---------|------|------|
| foo.bar | foo.gus | foo. | foo |
| foo. | foo.gus | foo. | foo |
| foo | foo.gus | foo. | foo |
where the 1st column is the path and the 2nd-4th columns are the parameter.
There are a few builder style APIs for immutable objects that have
withXXX
methods that return an instance with the change requested. AwithExtension(String ext)
could handle the common cases for removing, replacing, and adding an extension.Path p = Path.of("photo.png"); Path base = p.withExtension("") -> "photo" // replaces any extension with the empty extension (and no dot) p.withExtension("jpg") -> "photo.jpg" // replaces the extension base.withExtension("pdf") -> "photo.pdf" // adds the extension
Removing an extension with
""
may be a bit unusual, perhaps an additional method would aid discoverablitybasepath()
ordropExtension
, ortrimExtension
.
Right. Depending on how frequent an operation of removing extension is, one could also imagine a name such as withoutExtension()
. We have methods like that in the JDK.
Part of the point of these methods is to cover the various use cases without ever having to deal directly with a period "."
.
A
withExtension(String ext)
could handle the common cases for removing, replacing, and adding an extension.
How would this work for the case of "compound extensions" such as tar.gz
where one might want to end up with just tar
or with zip
, depending?
How would this work for the case of "compound extensions" such as
tar.gz
where one might want to end up with justtar
or withzip
, depending?
I'd view that a sequence of extensions so it would take a double application of withExtension to replace both. I.e.
Path path = Path.of("photos.tar.gz");
Path newP = path.withExtension("") // remove .gz; if there was no extension it would return the base
.withExtension(".zip"); -> "photos.zip"
// Testing for a compound extension
if (path.getExtension().equals("gz") && path.withExtension("").getExtension().equals("tar")) {...}
// Switching from zip to tar.gz is easier; It removes one extension and adds the string.
path.withExtension("tar.gz");
// Alternatively revert to old ways; no real help in this case.
path.filename().endsWith(".tar.gz");
This request could be reformulated as a list of three different topics:
- the definition of a filename extension;
- the extraction of the filename extension from a
Path
; and - the manipulation of the filename extension of a
Path
(to produce anotherPath
).
The contention thus far is centered around the third item and I conjecture that the first two items likely have a good level of consensus. I wonder therefore whether it might be better to break this proposal into two proposals, the first consisting of the the first two items in the list, and the second of the third item only.
There are a few builder style APIs for immutable objects that have
withXXX
methods that return an instance with the change requested. AwithExtension(String ext)
could handle the common cases for removing, replacing, and adding an extension.
This looks like a good direction to try out, seems better than cluttering Path with several methods to aid adding or stripping of an extension.
As suggested, I would rename removeExtension
to withoutExtension
and replace addExtension
with withExtension
.
Please rephrase or reword as you see fit:
/**
* Returns a Path with the {@code extension}, replacing the existing extension, if any.
* If the extension is non-null and non-empty, the "." and the extension
* is appended to the path returned by {@link #withoutExtension()},
* otherwise the path returned by {@link #withoutExtension()} is returned.
*
* @implSpec
* The default implementation is equivalent for this path to:
*
* {@snippet lang="java" :
* Path p = withoutExtension();
* if (extension == null || extension.isEmpty()) {
* return p;
* } else {
* return p.resolveSibling(p.getFileName() + "." + extension);
* }
*}
* @param extension
* the extension to add, may be {@code null}
*
* @return a Path with the {@code extension}, replacing the existing extension, if any
*
* @see #getExtension
* @see #withoutExtension
*
* @since 22
*/
default Path withExtension(String extension) {
Path path = withoutExtension();
if (extension == null || extension.isEmpty())
return path;
return path.resolveSibling(path.getFileName() + "." + extension);
}
Likely there is a more efficient implementation that does not create unnecessary Path intermediate objects.
This is the same as suggested : https://github.com/openjdk/jdk/pull/16226#issuecomment-1768659315
Thanks for the suggestion; I think it looks promising. I wonder however whether the notion of withoutExtension
needs to be explicit.
* Path p = withoutExtension();
* if (extension == null || extension.isEmpty()) {
* return p;
*
The use of withExtension("")
would be equivalent to withoutExtension()
so the later could be omitted for a simpler API.
And I think that's a reasonable simplification; replacing the extension is much more frequent operation than than dropping the extension.
The spec will need to explain/define how to trim off the extension, so the new extension can be added.
But it could be defined in terms of path.getFileName()
and perhaps path.resolveSibling
.
The spec will need to explain/define how to trim off the extension, so the new extension can be added.
Something like this perhaps for the nominal case?
Path p;
String name = p.getFileName().toString();
String ext = p.getExtension();
String s = name.substring(0, name.lastIndexOf(ext) - 1);
Path stripped = p.resolveSibling(s);
Almost, except for the detail that both "photo" and "photo." have an empty string as the extension so the trailing ".", if any, should be dropped.
Path p;
String name = p.getFileName().toString();
String ext = p.getExtension();
int index = name.length() - ext.length();
if (name.getChar(index - 1) == DOT)
index--;
String s = name.substring(0, indexOfExt);
Path stripped = p.resolveSibling(s);
Keeping the withoutExtension
method provides a place for that as a building block for withExtension
.
Keeping the
withoutExtension
method provides a place for that as a building block forwithExtension
.
Might be a good reason to keep it.
The current state of the removeExtension
code drops the parent part of the path if there is no extension or if there is an empty extension (i.e., trailing dot):
jshell> Path.of("a/foo").removeExtension()
$3 ==> foo
jshell> Path.of("a/foo.").removeExtension()
$4 ==> foo.
Probably need to add test cases for this.
I have additional questions about whether a trailing dot should or shouldn't be removed in this case; but maybe that's still currently being discussed? Not sure where that stands at the moment.
Probably need to add test cases for this.
Testing definitely needs to be broadened, but I think it should wait until there is consensus on the proposed API(s).
I have additional questions about whether a trailing dot should or shouldn't be removed in this case; but maybe that's still currently being discussed? Not sure where that stands at the moment.
At the moment I think that everything except perhaps Path.getExtension
is still pretty fluid.
String s = name.substring(0, indexOfExt);
I assume that indexOfExt
is a typo which should be simply index
.
My two cents:
- for
foo.
, wouldn'tremoveExtension().addExtension(getExtension())
result infoo
, i.e. the invariant be broken? - for
.gitconfig
, the current implementation ofgetExtension()
returns""
, but according to Windows conventions, the extension isgitconfig
- both PowerShell and Python (just 2 I tried) include the dot in the extension, and it seems natural to me to have an invariant that
base + extension = filename
. I assume this was discussed at length already though, so I'm just curious about the rationale for excluding the dot? - has the idea of introducing a new class and adding a method to Path (
getFileName()
is already taken, so I'd proposefilename()
) been explored? For example, recently I needed to work with extensions and ended up with something like below.
final class Filename implements CharSequence {
enum Convention {
DEFAULT,
WINDOWS
}
String base()
String base(Convention convention)
String extension()
String extension(Convention convention)
LocalizedBase localizedBase() {}
LocalizedBase localizedBase(Convention convention) {
// e.g. Java .properties files or .srt subtitle files
record LocalizedBase(String root, Locale locale) {}
}
My two cents:
Thanks for the comments.
* for `foo.`, wouldn't `removeExtension().addExtension(getExtension())` result in `foo`, i.e. the invariant be broken?
It looks like it would be. Another problematic corner case.
* for `.gitconfig`, the current implementation of `getExtension()` returns `""` , but according to Windows conventions, the extension is `gitconfig`
And according to the UNIX FILE(1)
command
$ file --extension .gitconfig
.gitconfig: ???
there is no extension. Many different tools and APIs were examined in the past and there is no consensus as to whether to include the period character. It's almost a 50-50 split.
* both PowerShell and Python (just 2 I tried) include the dot in the extension, and it seems natural to me to have an invariant that `base + extension = filename`. I assume this was discussed at length already though, so I'm just curious about the rationale for excluding the dot?
It has been discussed at length in prior incarnations of this effort. The invariant is an appealing idea.
My rationale for exlcuding the dot is that is analogous to a separator such as /
or \
and conveys no information on its own. In my own experience in other areas such as media (imaging and video) I have rarely seen the period character included in the extension, and if it was detected it was removed.
* has the idea of introducing a new class and adding a method to Path (`getFileName()` is already taken, so I'd propose `filename()`) been explored? For example, recently I needed to work with extensions and ended up with something like below.
This has not been discussed but I doubt it would be a very popular idea. One thing is that it would add complexity and API surface. Another is that OS-specific behavior is not usually codified.
For the .gitconfig
example, I just wanted to point out that the whole concept of a filename extension is merely an OS-specific convention. So I'd argue that the JDK should be neutral, rather than pick one specific convention. Since you mentioned "OS-specific behavior is not usually codified", the Convention
enum I used is not an option, but the getExtension
implementation could behave differently depending on its environment and specify something like the exact definition of "extension" is implementation specific
.
About whether to include the dot: I'm not a mathematician, but intuitively I'd say that excluding it makes it impossible to implement the API in such a way that equals(removeExtension().addExtension(getExtension()))
always holds. On the other hand, including it would make it trivial to do so.