jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8298318: (fs) APIs for handling filename extensions

Open bplb opened this issue 1 year ago • 62 comments

Add to java.nio.file.Path a method getExtension to retrieve the Path's extension, and companion methods removeExtension and addExtension.


Progress

  • [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue
  • [ ] Change requires a CSR request matching fixVersion 23 to be approved (needs to be created)

Issue

  • JDK-8298318: (fs) APIs for handling filename extensions (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16226/head:pull/16226
$ git checkout pull/16226

Update a local copy of the PR:
$ git checkout pull/16226
$ git pull https://git.openjdk.org/jdk.git pull/16226/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16226

View PR using the GUI difftool:
$ git pr show -t 16226

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16226.diff

Webrev

Link to Webrev Comment

bplb avatar Oct 17 '23 19:10 bplb

:wave: Welcome back bpb! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Oct 17 '23 19:10 bridgekeeper[bot]

This request follows from previous work outlined in this nio-dev thread.

For a given Path p, the following invariant obtains:

assert p.equals(p.removeExtension().addExtension(p.getExtension()));

Replacing an extension may be accomplished by composing the removal and addition methods as:

jshell> Path losslessImage = Path.of("photo.png");
losslessImage ==> photo.png

jshell> Path lossyImage = losslessImage.removeExtension().addExtension("jpg");
lossyImage ==> photo.jpg

bplb avatar Oct 17 '23 19:10 bplb

@bplb The following label will be automatically applied to this pull request:

  • nio

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Oct 17 '23 19:10 openjdk[bot]

The UNIX-specific implementation is intentionally omitted for now.

bplb avatar Oct 17 '23 20:10 bplb

/csr

bplb avatar Oct 17 '23 20:10 bplb

A CSR will be created once there is reasonable consensus on the verbiage in this conversation.

bplb avatar Oct 17 '23 20:10 bplb

@bplb has indicated that a compatibility and specification (CSR) request is needed for this pull request.

@bplb please create a CSR request for issue JDK-8298318 with the correct fix version. This pull request cannot be integrated until the CSR request is approved.

openjdk[bot] avatar Oct 17 '23 20:10 openjdk[bot]

I'm too busy right now to spend time on this one but from a quick glance, the proposed addExtension and removeExtension looks very strange. I think it would be useful to show some examples using them to justify why are needed. Also write out some candidate method names. A Path is immutable so I don't think you want methods with names like addXXX or removeXXX as the resulting usages will look like they mutate the Path.

AlanBateman avatar Oct 18 '23 06:10 AlanBateman

There are a few builder style APIs for immutable objects that have withXXX methods that return an instance with the change requested. A withExtension(String ext) could handle the common cases for removing, replacing, and adding an extension.

Path p = Path.of("photo.png");
Path base = p.withExtension("") -> "photo" // replaces any extension with the empty extension (and no dot)
p.withExtension("jpg") -> "photo.jpg"      // replaces the extension
base.withExtension("pdf") -> "photo.pdf"   // adds the extension

Removing an extension with "" may be a bit unusual, perhaps an additional method would aid discoverablitybasepath() or dropExtension, or trimExtension.

RogerRiggs avatar Oct 18 '23 14:10 RogerRiggs

There are a few builder style APIs for immutable objects that have withXXX methods that return an instance with the change requested.

Some time back I thought of but never proposed a method similar to withExtension(String) called modifyExtension(String parameter) with behavior

| path    | gus     | ""   | null |
|---------|---------|------|------|
| foo.bar | foo.gus | foo. | foo  |
| foo.    | foo.gus | foo. | foo  |
| foo     | foo.gus | foo. | foo  |

where the 1st column is the path and the 2nd-4th columns are the parameter.

bplb avatar Oct 18 '23 15:10 bplb

There are a few builder style APIs for immutable objects that have withXXX methods that return an instance with the change requested. A withExtension(String ext) could handle the common cases for removing, replacing, and adding an extension.

Path p = Path.of("photo.png");
Path base = p.withExtension("") -> "photo" // replaces any extension with the empty extension (and no dot)
p.withExtension("jpg") -> "photo.jpg"      // replaces the extension
base.withExtension("pdf") -> "photo.pdf"   // adds the extension

Removing an extension with "" may be a bit unusual, perhaps an additional method would aid discoverablitybasepath() or dropExtension, or trimExtension.

Right. Depending on how frequent an operation of removing extension is, one could also imagine a name such as withoutExtension(). We have methods like that in the JDK.

pavelrappo avatar Oct 18 '23 16:10 pavelrappo

Part of the point of these methods is to cover the various use cases without ever having to deal directly with a period ".".

bplb avatar Oct 18 '23 16:10 bplb

A withExtension(String ext) could handle the common cases for removing, replacing, and adding an extension.

How would this work for the case of "compound extensions" such as tar.gz where one might want to end up with just tar or with zip, depending?

bplb avatar Oct 24 '23 17:10 bplb

How would this work for the case of "compound extensions" such as tar.gz where one might want to end up with just tar or with zip, depending?

I'd view that a sequence of extensions so it would take a double application of withExtension to replace both. I.e.

   Path path = Path.of("photos.tar.gz");
   Path newP = path.withExtension("") // remove .gz; if there was no extension it would return the base
        .withExtension(".zip");   -> "photos.zip"

// Testing for a compound extension
   if (path.getExtension().equals("gz") && path.withExtension("").getExtension().equals("tar")) {...} 

// Switching from zip to tar.gz is easier; It removes one extension and adds the string.
  path.withExtension("tar.gz");

// Alternatively revert to old ways; no real help in this case.
    path.filename().endsWith(".tar.gz");

RogerRiggs avatar Oct 24 '23 18:10 RogerRiggs

This request could be reformulated as a list of three different topics:

  1. the definition of a filename extension;
  2. the extraction of the filename extension from a Path; and
  3. the manipulation of the filename extension of a Path (to produce another Path).

The contention thus far is centered around the third item and I conjecture that the first two items likely have a good level of consensus. I wonder therefore whether it might be better to break this proposal into two proposals, the first consisting of the the first two items in the list, and the second of the third item only.

bplb avatar Oct 30 '23 21:10 bplb

There are a few builder style APIs for immutable objects that have withXXX methods that return an instance with the change requested. A withExtension(String ext) could handle the common cases for removing, replacing, and adding an extension.

This looks like a good direction to try out, seems better than cluttering Path with several methods to aid adding or stripping of an extension.

AlanBateman avatar Nov 02 '23 15:11 AlanBateman

As suggested, I would rename removeExtension to withoutExtension and replace addExtension with withExtension. Please rephrase or reword as you see fit:

    /**
     * Returns a Path with the {@code extension}, replacing the existing extension, if any.
     * If the extension is non-null and non-empty, the "." and the extension
     * is appended to the path returned by {@link #withoutExtension()},
     * otherwise the path returned by {@link #withoutExtension()} is returned.
     *
     * @implSpec
     * The default implementation is equivalent for this path to:
     *
     * {@snippet lang="java" :
     * Path p = withoutExtension();
     * if (extension == null || extension.isEmpty()) {
     *     return p;
     * } else {
     *     return p.resolveSibling(p.getFileName() + "." + extension);
     * }
     *}
     * @param extension
     *        the extension to add, may be {@code null}
     *
     * @return a Path with the {@code extension}, replacing the existing extension, if any
     * 
     * @see #getExtension
     * @see #withoutExtension
     *
     * @since 22
     */
    default Path withExtension(String extension) {
        Path path = withoutExtension();
        if (extension == null || extension.isEmpty())
            return path;

        return path.resolveSibling(path.getFileName() + "." + extension);
    }

Likely there is a more efficient implementation that does not create unnecessary Path intermediate objects.

RogerRiggs avatar Nov 02 '23 17:11 RogerRiggs

This is the same as suggested : https://github.com/openjdk/jdk/pull/16226#issuecomment-1768659315

RogerRiggs avatar Nov 02 '23 17:11 RogerRiggs

Thanks for the suggestion; I think it looks promising. I wonder however whether the notion of withoutExtension needs to be explicit.

     * Path p = withoutExtension();
     * if (extension == null || extension.isEmpty()) {
     *     return p;
     *

bplb avatar Nov 02 '23 17:11 bplb

The use of withExtension("") would be equivalent to withoutExtension() so the later could be omitted for a simpler API. And I think that's a reasonable simplification; replacing the extension is much more frequent operation than than dropping the extension.

The spec will need to explain/define how to trim off the extension, so the new extension can be added. But it could be defined in terms of path.getFileName() and perhaps path.resolveSibling.

RogerRiggs avatar Nov 02 '23 19:11 RogerRiggs

The spec will need to explain/define how to trim off the extension, so the new extension can be added.

Something like this perhaps for the nominal case?

Path p;
String name = p.getFileName().toString();
String ext = p.getExtension();
String s = name.substring(0, name.lastIndexOf(ext) - 1);
Path stripped = p.resolveSibling(s);

bplb avatar Nov 03 '23 18:11 bplb

Almost, except for the detail that both "photo" and "photo." have an empty string as the extension so the trailing ".", if any, should be dropped.

Path p;
String name = p.getFileName().toString();
String ext = p.getExtension();
int index = name.length() - ext.length();
if (name.getChar(index - 1) == DOT)
    index--;
String s = name.substring(0, indexOfExt);
Path stripped = p.resolveSibling(s);

Keeping the withoutExtension method provides a place for that as a building block for withExtension.

RogerRiggs avatar Nov 03 '23 19:11 RogerRiggs

Keeping the withoutExtension method provides a place for that as a building block for withExtension.

Might be a good reason to keep it.

bplb avatar Nov 03 '23 20:11 bplb

The current state of the removeExtension code drops the parent part of the path if there is no extension or if there is an empty extension (i.e., trailing dot):

jshell> Path.of("a/foo").removeExtension()
$3 ==> foo
jshell> Path.of("a/foo.").removeExtension()
$4 ==> foo.

Probably need to add test cases for this.

I have additional questions about whether a trailing dot should or shouldn't be removed in this case; but maybe that's still currently being discussed? Not sure where that stands at the moment.

stuart-marks avatar Nov 09 '23 05:11 stuart-marks

Probably need to add test cases for this.

Testing definitely needs to be broadened, but I think it should wait until there is consensus on the proposed API(s).

I have additional questions about whether a trailing dot should or shouldn't be removed in this case; but maybe that's still currently being discussed? Not sure where that stands at the moment.

At the moment I think that everything except perhaps Path.getExtension is still pretty fluid.

bplb avatar Nov 09 '23 17:11 bplb

String s = name.substring(0, indexOfExt);

I assume that indexOfExt is a typo which should be simply index.

bplb avatar Nov 21 '23 00:11 bplb

My two cents:

  • for foo., wouldn't removeExtension().addExtension(getExtension()) result in foo, i.e. the invariant be broken?
  • for .gitconfig, the current implementation of getExtension() returns "" , but according to Windows conventions, the extension is gitconfig
  • both PowerShell and Python (just 2 I tried) include the dot in the extension, and it seems natural to me to have an invariant that base + extension = filename. I assume this was discussed at length already though, so I'm just curious about the rationale for excluding the dot?
  • has the idea of introducing a new class and adding a method to Path (getFileName() is already taken, so I'd propose filename()) been explored? For example, recently I needed to work with extensions and ended up with something like below.
final class Filename implements CharSequence {

    enum Convention {
        DEFAULT,
        WINDOWS
    }

    String base()
    String base(Convention convention)
    String extension()
    String extension(Convention convention)

    LocalizedBase localizedBase() {}
    LocalizedBase localizedBase(Convention convention) {

    // e.g. Java .properties files or .srt subtitle files
    record LocalizedBase(String root, Locale locale) {}

}

anthonyvdotbe avatar Nov 21 '23 17:11 anthonyvdotbe

My two cents:

Thanks for the comments.

* for `foo.`, wouldn't `removeExtension().addExtension(getExtension())` result in `foo`, i.e. the invariant be broken?

It looks like it would be. Another problematic corner case.

* for `.gitconfig`, the current implementation of `getExtension()` returns `""` , but according to Windows conventions, the extension is `gitconfig`

And according to the UNIX FILE(1) command

$ file --extension .gitconfig 
.gitconfig: ???

there is no extension. Many different tools and APIs were examined in the past and there is no consensus as to whether to include the period character. It's almost a 50-50 split.

* both PowerShell and Python (just 2 I tried) include the dot in the extension, and it seems natural to me to have an invariant that `base + extension = filename`. I assume this was discussed at length already though, so I'm just curious about the rationale for excluding the dot?

It has been discussed at length in prior incarnations of this effort. The invariant is an appealing idea.

My rationale for exlcuding the dot is that is analogous to a separator such as / or \ and conveys no information on its own. In my own experience in other areas such as media (imaging and video) I have rarely seen the period character included in the extension, and if it was detected it was removed.

* has the idea of introducing a new class and adding a method to Path (`getFileName()` is already taken, so I'd propose `filename()`) been explored? For example, recently I needed to work with extensions and ended up with something like below.

This has not been discussed but I doubt it would be a very popular idea. One thing is that it would add complexity and API surface. Another is that OS-specific behavior is not usually codified.

bplb avatar Nov 21 '23 18:11 bplb

For the .gitconfig example, I just wanted to point out that the whole concept of a filename extension is merely an OS-specific convention. So I'd argue that the JDK should be neutral, rather than pick one specific convention. Since you mentioned "OS-specific behavior is not usually codified", the Convention enum I used is not an option, but the getExtension implementation could behave differently depending on its environment and specify something like the exact definition of "extension" is implementation specific.

About whether to include the dot: I'm not a mathematician, but intuitively I'd say that excluding it makes it impossible to implement the API in such a way that equals(removeExtension().addExtension(getExtension())) always holds. On the other hand, including it would make it trivial to do so.

anthonyvdotbe avatar Nov 22 '23 07:11 anthonyvdotbe