jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8057113: (fs) Path should have a method to obtain the filename extension

Open bplb opened this issue 2 years ago • 29 comments

Resurrection of the proposal to add a method to obtain the filename extension originated in PR 2319.


Progress

  • [x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue
  • [ ] Change requires a CSR request to be approved

Issues

  • JDK-8057113: (fs) Path should have a method to obtain the filename extension
  • JDK-8260738: (fs) Path should have a method to obtain the filename extension (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/8066/head:pull/8066
$ git checkout pull/8066

Update a local copy of the PR:
$ git checkout pull/8066
$ git pull https://git.openjdk.org/jdk pull/8066/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 8066

View PR using the GUI difftool:
$ git pr show -t 8066

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/8066.diff

bplb avatar Mar 31 '22 20:03 bplb

:wave: Welcome back bpb! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Mar 31 '22 20:03 bridgekeeper[bot]

PR 2319 was unexpectedly deleted when its branch was renamed and cannot be reopened. It may still be referred to for background information however. This PR as-is is not considered complete, but rather a continuation of PR 2319.

bplb avatar Mar 31 '22 20:03 bplb

@bplb The following label will be automatically applied to this pull request:

  • nio

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Mar 31 '22 20:03 openjdk[bot]

Various alternatives for getExtension() were enumerated in PR 2319. Other, related methods are also possible. Here is a list of potential methods:

  1. Path getExtension(Path default) returns the extension retaining the platform representation
  2. default String getExtension(String default) returns the extension of the string representation of the path
  3. default boolean hasExtension(Path extension) returns whether the path has the specified extension
  4. default boolean hasExtension(String extension, boolean ignoreCase) returns whether the string representation of the path has the specified extension, optionally ignoring case
  5. Path replaceExtension(Path extension) replaces the path's extension with the specified extension
  6. Path removeExtension() removes the path's extension, including the dot that precedes it

bplb avatar Apr 01 '22 21:04 bplb

Some perhaps somewhat contrived examples of using the methods enumerated above are listed below. These code samples have neither been compiled nor tested.

getExtension - Reading an image file

Path imagepath;
String extension = imagepath.getExtension("");
Iterator<ImageReader> readers = ImageIO.getImageReadersBySuffix(extension);
BufferedImage image = null;
for (ImageReader reader : readers) {
    try {
        ImageInputStream s = ImageIO.createImageInputStream(imagepath.toFile());
        reader.setInput(s);
        image = reader.read(0);
        break;
    } catch (IOException ex) {
    }
}

replaceExtension: Transcoding PNG to JPEG: ".png" -> ".jpg"

Path pngPath;
Path jpegPath = pngPath.replaceExtension(Path.of("jpg"));
ImageIO.write(ImageIO.read(pngPath.toFile()), "JPEG", jpegPath.toFile());

removeExtension: Decompressing a GZIP compressed file: ".tar.gz" -> ".tar"

Path tgzPath;
InputStream in = new GZIPInputStream(new FileInputStream(tgzPath.toFile()));
Path tarPath = tgzPath.removeExtension();
OutputStream out = new FileOutputStream(tarPath.toFile());
in.transferTo(out);

hasExtension: List all MP3 files in a directory java.io version

File dir;
File[] mp3s = dir.listFiles(new FileFilter() {
        public boolean accept(File pathname) {
            return pathname.toPath().hasExtension("mp3", true);
        }
    });

java.nio version

Path dir;
DirectoryStream<Path> mp3s = Files.newDirectoryStream(dir,
    new DirectoryStream.Filter<Path>() {
        public boolean accept(Path file) throws IOException {
            return file.hasExtension("mp3", true);
        }
    });

bplb avatar Apr 01 '22 21:04 bplb

The idea from the potential methods above that an extension is a Path is non-intuitive and also harder to use. It may have some advantages in the implementation but it doesn't make the API easier to use either because of the necessity to create a Path, Path.of("jpg");, to hold an extension or to explicitly convert to string to compare the extension. There are more and better ways of working with strings than Paths and they should be directly available as methods.

RogerRiggs avatar Apr 07 '22 15:04 RogerRiggs

I would include:

  1. default String getExtension(String default) returns the extension of the string representation of the path

  2. default boolean hasExtension(String extension, boolean ignoreCase) returns whether the string representation of the path has the specified extension, optionally ignoring case

    An alternative to case insensitive is Optional<String> hasExtension(String ext, String... extensions); It allows at least one extension and a list of possible other extensions and returns the first one found.

5ish. Path replaceExtension(String extension) replaces the path's extension with the specified extension No particular bias against removeExtension but if total method count is an issue replaceExtension can be used to remove an extension with an empty string argument

These can avoid needing to convert Paths fully to strings and back; retaining the implementation specific representation and giving the developer a familiar string based API.

RogerRiggs avatar Apr 07 '22 16:04 RogerRiggs

@RogerRiggs Thanks for the comments.

I tend to agree with you about using String and not Path to represent an extension. I do not see what value it would add here to use Path. I think the subset of methods you selected is a good one. I also had thought about having replaceExtension(“”) be equivalent to removeExtension() but I failed to document it.

bplb avatar Apr 07 '22 20:04 bplb

Commit 4adf810d060138e50d7ce2266e4299645149dd89 adds to Path the default methods hasExtension() and replaceExtension(), and overrides of replaceExtension() using the native path representation in UnixPath and ZipPath. Unfortunately the latter override is entirely duplicative of the former for now. Note that replaceExtension() in fact has three capabilities: stripping the extension, appending an extension if none is already present, or actually replacing an extension.

bplb avatar Apr 21 '22 01:04 bplb

Commit e5240f1c10023b447a8eac91fc188dea033c9616 re-specifies getExtension() to return Optional.of("") for cases where the string representation of the Path has a dot ('.') at the last index and Optional.empty() was formerly returned. A test is also added.

bplb avatar May 03 '22 00:05 bplb

The CSR has been updated to reflect the recent revisions of the extension method specifications.

bplb avatar May 05 '22 22:05 bplb

@bplb This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Jun 03 '22 14:06 bridgekeeper[bot]

This is still a good idea.

RogerRiggs avatar Jun 03 '22 15:06 RogerRiggs

3f510ecc32089589203b1fffb13eedc579aaa8bb fixes the .. case which was previously yielding null when it should be "".

bplb avatar Jul 27 '22 15:07 bplb

As a default method, is there anything to say about when/if to implement for a specific file system.

I suppose it could say something like

If an implementation maintains the Path in an internal representation which is not a String, then it might be best to override the default implementation to work with that representation until the result is obtained and only then convert the result to a String.

bplb avatar Jul 28 '22 15:07 bplb

Thank you for persisting with this one, the original Path::getExtension method was prototyped more 12 years ago, the main issue at the being being trying to get to a saner definition of an extension. You've got this to a good place.

AlanBateman avatar Aug 03 '22 15:08 AlanBateman

@bplb This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Sep 01 '22 13:09 bridgekeeper[bot]

Lets keep this alive and provide and resolve remaining issues.

RogerRiggs avatar Sep 02 '22 13:09 RogerRiggs

@bplb This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Sep 30 '22 16:09 bridgekeeper[bot]

continue;

bplb avatar Sep 30 '22 16:09 bplb

The principal unresolved issue here concerns leading period characters. There is consensus that the extension of a file name string such as .foo is null, but not as to whether the extension of a file name string such as ...foo is null or foo. My intent was that if there were multiple leading periods, then all but the last would be treated as if they were alphabetic, hence the behavior for ...foo would be the same as for xy.foo. I could be persuaded otherwise, and the alternative is likely simpler.

bplb avatar Sep 30 '22 16:09 bplb

OK. I think the meta-requirements are that the behavior be well-specified, implemented correctly, have tests for all the edge cases, and ideally also have defensible behavior. (I was going to write "intuitive" behavior but that's an invitation for bikeshedding, which this change has already suffered from too much.)

I don't particularly care which rule for special handling of . is used, as long as it meets the meta-requirements. I had proposed the "ignore all leading dots" rule because I thought it was simple, but I'm not beholden to this approach. If you (@bplb) prefer something different, then please go ahead with it.

stuart-marks avatar Sep 30 '22 21:09 stuart-marks

For a filename to have an extension, then I think there needs to be a non-empty root part. So:

  ".z",         root = ".z"        ext = null
  "z.",         root = "z"         ext = ""
  "..z",        root = "."         ext = "z"
  "...foo",     root = "..",       ext = "foo"  
  ".....",      root = "....",     ext = ""

Algorithm wise, you can find the last "." and check if there is a non-empty string before it. Scanning of the prefix is not necessary.

RogerRiggs avatar Oct 03 '22 15:10 RogerRiggs

The most surprising thing would be to consider .foo to have the extension foo, which would make it impossible to treat .foo together with .foo.log in a sensible way.

Multiple leading period characters are a corner case of a corner case.

mbreinhold avatar Oct 03 '22 16:10 mbreinhold

Interestingly, double clicking on both the files .txt and ...txt with identical content (and no file signature) in Linux File Manager, macOS Finder, and Windows Explorer all open the respective platform default text editor.

bplb avatar Oct 03 '22 19:10 bplb

Interestingly, double clicking on both the files .txt and ...txt with identical content (and no file signature) in Linux File Manager, macOS Finder, and Windows Explorer all open the respective platform default text editor.

Is that because those systems are interpreting the txt suffix, or is it because they always open files of unknown type with the platform’s default text editor?

mbreinhold avatar Oct 03 '22 20:10 mbreinhold

Is that because those systems are interpreting the txt suffix, or is it because they always open files of unknown type with the platform’s default text editor?

Looks like a bit of each. Files named .xy74j0 and ...xy74j0 with identical text content are opened by the default text editor on Linux. On macOS, .xy74j0 is also opened by the default text editor but clicking on ...xy74j0 results in a popup which states

There is no application set to open the document “...xy74j0”.

On Windows, a popup appears in both cases asking that an appropriate app be selected to open the file.

So it looks like on Linux a file of unknown type might always be opened by the default text editor, but based on this example and the previous ...txt example, on macOS and Windows there is some inference of the file type from the file name or extension, however the latter is defined.

bplb avatar Oct 06 '22 20:10 bplb

The specification paragraph is rewritten in 104a7116288d80ea9cf384b078bbfbeb0f1eece7 but the @implSpec content is unchanged.

bplb avatar Oct 14 '22 16:10 bplb

@bplb This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8057113: (fs) Path should have a method to obtain the filename extension

Reviewed-by: rriggs, lancea, mr, alanb

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 320 new commits pushed to the master branch:

  • d17bf51f91371072a758e775136af946192e771b: 8295914: Add a header to generated HTML files in specs
  • 9911405e543dbe07767808bad88534abbcc03c5a: 8282958: Rendering issues of borders, TextFields on Windows High-DPI systems
  • 2fb64a4a4fd650e8767bb9959dc53f8c450d4060: 8296162: [aarch64] Remove unused Address::_is_lval
  • 15b8b45178637acb07c33194f564acf807dfa5d4: 8296161: [aarch64] Remove unused "pcrel" addressing mode tag
  • f829b5a73f699ca7fc513f491f77daae6c8f4ed9: 8280378: [IR Framework] Support IR matching for different compile phases
  • da0ae5128a250bb7a5c6a7484589528db8220ed2: 8296167: test/langtools/tools/jdeps/jdkinternals/ShowReplacement.java failing after JDK-8296072
  • 0d0bd7bd409c0caa5edebe3d1eacf8e5bb48f984: 8296072: CertAttrSet::encode and DerEncoder::derEncode should write into DerOutputStream
  • 37107fc1574a4191987420d88f7182e63c7da60c: 8296007: crash in runtime/DefineClass/NullClassBytesTest.java
  • 4999f2cb164743ebf4badd3848a862609528dde3: 8156593: DataOutput.write(byte[],int,int) and its implementations do not specify index out bounds
  • 8480f87044f72e0312add190e75ee27030b7e10a: 8291974: PrivateCredentialPermission should not use local variable to enable debugging
  • ... and 310 more: https://git.openjdk.org/jdk/compare/f888aa953c1335f438ded22abf66b090e894684c...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar Nov 01 '22 21:11 openjdk[bot]