jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8348828: Windows dll loading now resolves symlinks

Open benjaminp opened this issue 8 months ago • 51 comments

Deep in the bowels of System.loadLibrary, File.getCanonicalPath() is called on the target library file before it is passed to the system library loading APIs. In JDK-8003887, File.getCanonicalPath was altered to resolve symlinks on Windows. This had unintended consequences for passing a symlink to System.loadLibrary on Windows. The underlying Windows LoadLibrary API inspects the file name passed to it and adds a .dll extension if the it is not already present. Thus, if System.loadLibrary was given a symlink to a file and that file didn't have a .dll extension, LoadLibrary try to load nonexistent file and fail.

Fix this problem by appending a . to library paths in Windows' os::dll_load. This trailing dot inhibits LoadLibrary's own appending behavior.


Progress

  • [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8348828: Windows dll loading now resolves symlinks (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24694/head:pull/24694
$ git checkout pull/24694

Update a local copy of the PR:
$ git checkout pull/24694
$ git pull https://git.openjdk.org/jdk.git pull/24694/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24694

View PR using the GUI difftool:
$ git pr show -t 24694

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24694.diff

Using Webrev

Link to Webrev Comment

benjaminp avatar Apr 16 '25 17:04 benjaminp

Hi @benjaminp, welcome to this OpenJDK project and thanks for contributing!

We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.

If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user benjaminp" as summary for the issue.

If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.

bridgekeeper[bot] avatar Apr 16 '25 17:04 bridgekeeper[bot]

❗ This change is not yet ready to be integrated. See the Progress checklist in the description for automated requirements.

openjdk[bot] avatar Apr 16 '25 17:04 openjdk[bot]

/covered by corporate OCA under "EngFlow Inc."

benjaminp avatar Apr 16 '25 17:04 benjaminp

@benjaminp The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Apr 16 '25 17:04 openjdk[bot]

Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!

bridgekeeper[bot] avatar Apr 16 '25 17:04 bridgekeeper[bot]

@AlanBateman @bplb Hello! Any chance either of you could be induced to sponsor and review this? Also, any idea how to accelerate the OCA verification process?

benjaminp avatar Apr 28 '25 16:04 benjaminp

This issue has the "oca" label so we cannot engage, sorry.

AlanBateman avatar Apr 28 '25 16:04 AlanBateman

This issue has the "oca" label so we cannot engage, sorry.

Ah, sorry. I only asked because I saw a colleague of yours helping out with OCA verification over at https://github.com/openjdk/jdk/pull/23905#issuecomment-2762865417. Also, the OCA FAQ suggests to "please reach out to project maintainers (e.g., through comments on your GitHub PR, Slack channels, or mailing lists)". Any breadcrumbs you can lay for me to escape this forest are appreciated!

benjaminp avatar Apr 28 '25 17:04 benjaminp

Hello @benjaminp! I think you should wait for the OCA process — it's very fast. I'll just ping @robilad, as he might be able to help.

gustavosimon avatar Apr 28 '25 19:04 gustavosimon

Are you the submitter of JDK-8348828? Asking because the configuration described seems very unusual and sounds like something that changed the JDK image so the DLLs from the JDK bin directory are moved to somewhere else and sym link put in place instead. I think we also need to understand if there is a bug in File::getCanonicalPath before change the usage.

AlanBateman avatar May 07 '25 17:05 AlanBateman

Are you the submitter of JDK-8348828?

Yes, I am.

Asking because the configuration described seems very unusual and sounds like something that changed the JDK image so the DLLs from the JDK bin directory are moved to somewhere else and sym link put in place instead.

Correct, I explained a bit on the issue, but we have a system that replicates normal filesystem trees with symlink trees.

I think we also need to understand if there is a bug in File::getCanonicalPath before change the usage.

I believe the problem is that getCanonicalPath is working too well and that symlink resolution of native libraries should be left to the system resolver.

benjaminp avatar May 07 '25 18:05 benjaminp

Would it be possible to paste in here, or in the JBS issue, some examples of the path provided to LoadLibrary with some commentary on the sym links created on the file system.

AlanBateman avatar May 07 '25 19:05 AlanBateman

Going back to the reproducer in my OP on JBS:

> mv .\jdk-24\bin\jimage.dll rando
> New-Item -Path .\jdk-24\bin\jimage.dll -ItemType SymbolicLink -Value .\rando
> \jdk-24\bin\javac 

During JVM startup, NativeImageBuffer will execute System.loadLibrary("jimage"), which will eventually make its way into NativeLibraries.loadLibrary. NativeLibraries will search the native library path and identitfy .\jdk-24\bin\jimage.dll as a file to load. The getCanonicalPath() invocation changed in this PR will be invoked to determine the final path to load. At this point, the path to load will be $PWD\rando due to symlink resolution. Entering the VM, NativeLibraries.load will eventually pass $PWD\rando to LoadLibrary. Since .\rando lacks a .dll extension on its face, LoadLibrary will add it, observe that $PWD\rando.dll doesn't exist and fail. This chain of events is avoided by presenting LoadLibrary with jimage.dll and letting the system do the symlink resolution as happened previously in JDK23.

benjaminp avatar May 07 '25 19:05 benjaminp

New-Item -Path .\jdk-24\bin\jimage.dll -ItemType SymbolicLink -Value .\rando

What if instead this were the following

> New-Item -Path .\jdk-24\bin\jimage.dll -ItemType SymbolicLink -Value .\rando.dll

where .dll is appended to the target?

At this point, the path to load will be $PWD\rando due to symlink resolution.

Which is in fact correct, no?

Entering the VM, NativeLibraries.load will eventually pass $PWD\rando to LoadLibrary. Since .\rando lacks a .dll extension on its face, LoadLibrary will add it, observe that $PWD\rando.dll doesn't exist and fail.

Indeed from the LoadLibrary documentation

If the string specifies a module name without a path and the file name extension is
omitted, the function appends the default library extension ".DLL" to the module name.
To prevent the function from appending ".DLL" to the module name, include a trailing
point character (.) in the module name string.

so a trailing . character on rando might equally resolve the problem?

bplb avatar May 07 '25 21:05 bplb

New-Item -Path .\jdk-24\bin\jimage.dll -ItemType SymbolicLink -Value .\rando

What if instead this were the following

> New-Item -Path .\jdk-24\bin\jimage.dll -ItemType SymbolicLink -Value .\rando.dll

where .dll is appended to the target?

I expect that would work given the LoadLibrary documentation you've quoted.

At this point, the path to load will be $PWD\rando due to symlink resolution.

Which is in fact correct, no?

Correct in that that's the library that should be loaded. LoadLibrary doesn't want to cooperate, though.

Entering the VM, NativeLibraries.load will eventually pass $PWD\rando to LoadLibrary. Since .\rando lacks a .dll extension on its face, LoadLibrary will add it, observe that $PWD\rando.dll doesn't exist and fail.

Indeed from the LoadLibrary documentation

If the string specifies a module name without a path and the file name extension is
omitted, the function appends the default library extension ".DLL" to the module name.
To prevent the function from appending ".DLL" to the module name, include a trailing
point character (.) in the module name string.

so a trailing . character on rando might equally resolve the problem?

Perhaps. In my usecase, files come into the system "anonymously"; files that make up the symlink tree are received before it's known what the layout of the tree will be. A link to a given file could turn out to be, e.g., myaudio.mp3, MyClass.java, or jimage.dll. I'm disinclined to append . or .DLL to every file in off chance that it turns out to be a library that Java wants to load.

benjaminp avatar May 07 '25 21:05 benjaminp

At this point, the path to load will be $PWD\rando due to symlink resolution.

Which is in fact correct, no?

Correct in that that's the library that should be loaded. LoadLibrary doesn't want to cooperate, though.

So is it correct to say that the link resolution appears correct but LoadLibrary's quirks cause the intended behavior to fail?

bplb avatar May 07 '25 23:05 bplb

At this point, the path to load will be $PWD\rando due to symlink resolution.

Which is in fact correct, no?

Correct in that that's the library that should be loaded. LoadLibrary doesn't want to cooperate, though.

So is it correct to say that the link resolution appears correct but LoadLibrary's quirks cause the intended behavior to fail?

Yes. The getCanonicalPath canonicalization indeed produces the file pointed to by the symlink. However, in the context of NativeLibraries, doing path canonicalization is at best unnecessary and at worst yields the linked bug.

benjaminp avatar May 07 '25 23:05 benjaminp

So is it correct to say that the link resolution appears correct but LoadLibrary's quirks cause the intended behavior to fail?

Yes. The getCanonicalPath canonicalization indeed produces the file pointed to by the symlink.

All right, it is good to know that it is behaving in the desired manner.

However, in the context of NativeLibraries, doing path canonicalization is at best unnecessary and at worst yields the linked bug.

You might be correct. We'll see what @AlanBateman and others have to say about it.

bplb avatar May 07 '25 23:05 bplb

You might be correct. We'll see what @AlanBateman and others have to say about it.

I'm still puzzled as to why the DLLs have been moved from the JDK bin directory to some other location, and renamed so they don't have the ".dll" suffix. There most be some other product in the picture that we can't see. The quoted text from the Windows LoadLibrary documentation, where it appends the ".dll" suffix when not provided, is very useful as it helps us understand why it fails.

As regards dropping the canonicalization then it would require thinking about the lock map used for mapping the library names to locks. It would need checking if it would break concurrent loading when using different names / file paths to the same DLL. There may be a route that involves adding a method to ClassLoaderHelper to post-process the path and the Windows version could add the "." when it doesn't have the ".dll" suffix.

AlanBateman avatar May 08 '25 13:05 AlanBateman

You might be correct. We'll see what @AlanBateman and others have to say about it.

I'm still puzzled as to why the DLLs have been moved from the JDK bin directory to some other location, and renamed so they don't have the ".dll" suffix. There most be some other product in the picture that we can't see. The quoted text from the Windows LoadLibrary documentation, where it appends the ".dll" suffix when not provided, is very useful as it helps us understand why it fails.

Yes, the mystery product is my code. :) It makes a JDK image with a tree of symlinks to a data directory. The regular files in the data directory do not have file extensions.

As regards dropping the canonicalization then it would require thinking about the lock map used for mapping the library names to locks. It would need checking if it would break concurrent loading when using different names / file paths to the same DLL.

I can see that removing the canonicalization could break the lock map. (Though I see the current deduplication and locking scheme doesn't handle hardlinks. Probably it would have to be keyed by FileKey to do that.)

There may be a route that involves adding a method to ClassLoaderHelper to post-process the path and the Windows version could add the "." when it doesn't have the ".dll" suffix.

I tried out this approach, and it solves my problem. I've now pushed the implementation.

benjaminp avatar May 08 '25 17:05 benjaminp

I built the JDK with the diff in 70e943c applied on Windows 11 and verified that it resolves the scenario presented in the issue description, namely that jimage.dll can be moved elsewhere, in this case C:\Users\bpb\rando, and a link created from jimage.dll to rando and with this config java and jshell can be run.

bplb avatar May 08 '25 18:05 bplb

@benjaminp This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Jun 06 '25 21:06 bridgekeeper[bot]

/keepalive

benjaminp avatar Jun 06 '25 22:06 benjaminp

@benjaminp The pull request is being re-evaluated and the inactivity timeout has been reset.

openjdk[bot] avatar Jun 06 '25 22:06 openjdk[bot]

@AlanBateman any further feedback here?

benjaminp avatar Jun 11 '25 16:06 benjaminp

@benjaminp This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Jul 09 '25 19:07 bridgekeeper[bot]

/touch

Any thoughts on the approach we landed on now?

benjaminp avatar Jul 09 '25 19:07 benjaminp

@benjaminp The pull request is being re-evaluated and the inactivity timeout has been reset.

openjdk[bot] avatar Jul 09 '25 19:07 openjdk[bot]

@benjaminp This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Aug 06 '25 23:08 bridgekeeper[bot]