Fix dynamic library lookup with remotely executed tools
When a cc_binary tool is executed directly (e.g. in a genrule) with remote execution, its dynamic dependencies are only available from the solib directory in the runfiles directory. This commit adds an additional rpath entry for this directory.
Since target names may contain commas and the newly added rpaths contain
target names, the -Xlinker compiler is now used everywhere instead of
the -Wl flag, which doesn't support commas in linker arguments.
Stacked on #16214
@EdSchouten This is essentially your #14600 with an Integration test and support for commas. Could you verify that I got this right based on the changes in #16214?
@oquenchil Could you review after #16214?
@lberki This is the follow-up to #16214 that handles the remaining cases (including commas and --experimental_sibling_repository_layout) by switching to -Xlinkereverywhere. I don't think that-Wl` provides an escape character to escape commas with, so I found this to be the only way.
@oquenchil @sgowroji This should be merged separately after #16214 so that the former can be cherry-picked into 5.3.1.
@sgowroji You can go ahead with the merge, #16214 has been merged and I rebased the PR.
@fmeum Thank you for very quick changes! Can you please fix the build kite failures.
@sgowroji Pretty sure those were flakes. I restarted the pipeline.
FYI this was rolled back in https://github.com/bazelbuild/bazel/commit/aa2bc093d2b6be83ba9c5367537b424e6347d819.
@lberki The rollback reason is "Breaks targets on TGP.". Could you provide details that allow me to reproduce this externally?
@fmeum I have no idea, I'm at the moment acting in my capacity as the Person Who Rolls Back Problematic Changes (and rolling it back didn't require any domain knowledge)
My hunch is that this is simply an extremely badly behaved internal test case, but I don't quite have the capacity to do the debugging. @comius how to proceed here?
@fmeum I have no idea, I'm at the moment acting in my capacity as the Person Who Rolls Back Problematic Changes (and rolling it back didn't require any domain knowledge)
My hunch is that this is simply an extremely badly behaved internal test case, but I don't quite have the capacity to do the debugging. @comius how to proceed here?
@oquenchil Is it possible that some toolchain at Google is still using -Wl to pass linker flags and/or hasn't updated the wrapper? Based on the info I have, I would expect //third_party/crosstool/v18/stable:cc-compiler-k8-llvm (not entirely sure how the internal labels look like) to have an osx_cc_wrapper.sh that hasn't been updated with the same change I made to tools/cpp/osx_cc_wrapper.sh in this PR and include -Wl in its toolchain definition.
cc @buildbreaker2021 for the comment above since @oquenchil is on vacation this week.
I will have a look at this.
I've looked into this for a while, a couple of findings which might help:
- If src/main/java/com/google/devtools/build/lib/rules/cpp/LibrariesToLinkCollector.java is reverted to the previous state the target builds. So the culprit might be somewhere there.
- The target name contains a comma.
- I've also tried your suggestion about tools/cpp/osx_cc_wrapper.sh with no success.
I will continue looking into this for further investigation, also maybe I will find a way to reproduce this externally or find the reason behind failure.
@buildbreaker2021 Thanks for sharing your findings. Did you check whether the failing target uses any toolchains that still use -Wl rather than -Xlinker?
I think I got to the bottom of this. After comparing linker commands there were still usages of -Wl. I have a fix unless it does not break anything else. I will try to re-merge this tomorrow.
@buildbreaker2021 Friendly ping, were you able to fix the remaining -Wl uses?
Hey, I think I have the fix, but from our side it requires quite a bit of testing, I've kicked it off this morning.
@buildbreaker2021 Friendly ping, would be great to get this in before the release baseline is cut if there are no real blockers.
@meteorcloudy I'll add this to the list of issues for the 6.0 milestone to be considered as a requirement for 6.0 . I personally don't think this should block 6.0 (it's been delayed enough), but it's useful to keep track of the "nice to haves" for that release.
@fmeum The change from our side is not straightforward.
So our change which is required for the fix is instant(does not require a release to be picked up). However the rest of the changes in the PR require a release. Also the fix which I have in mind cannot work without the changes in the PR.
To sum it up:
- We release the fix with the PR - fix is picked up instantly the PR is not, so the fix without PR is breaking tests.
- We release fix first PR second - same problem as in 1.
- We release PR first and the fix second - this is the reason for the current rollback.
I am pretty sure there is a way around this, it is not just straightforward.
The solution which I currently have in mind, would require at least 1-2 weeks, depending on the internal Bazel release time.
@buildbreaker2021 In case that would help, I'm happy to make my PR public domain (or whatever is needed) and have you integrate it in your fix CL so that you can merge both simultaneously without going through the import step.
I will ping the PR once we are ready for the merge - import step is fine from my side if you are fine with it.
Hey @fmeum ,
During further testing I discovered that my fix(even though it fixed current issue) breaks something else. So it would require more time commitment from my side compared to what I initially expected.
To unblock you and to keep the things moving I suggest the following solution:
- Could you add an incompatible flag like this: https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/analysis/config/CoreOptions.java#L62-L69 ? Maybe that is not a right file for the flag to live in, this is just an example. So the flag will serve as a switch between your PR behavior vs old behavior. This flag will also make it easier to fix breakages from our side.
- You can set the flag to True, so that changes in this PR are picked up by Bazel, and assuming it does not break anyone, it should be fine. I will flip the flag to False internally so that we still use the old implementation until I find a proper fix.
@buildbreaker2021 I pushed a new commit that gates the change behind a flag that defaults to true.
@buildbreaker2021 I forgot to carry the new flag over in getHost(). I force-pushed a new version, please import that one.
A quick update.
After seeing @fmeum 's latest commit I realized that going in a flag direction would be quite tedious, since on top of the flag we would need to add the logic which would split implementation in two branches etc. I found some time on Friday and I think I fixed the underlying issue without adding the flag. So I rolled forward the initial PR without a flag - let's hope it sticks this time.
I'm cleaning up the 6.0.0 nice to have label, do we still want to cherry pick this change into the next 6.x release? If so, feel free to mark it!
This was merged before the baseline cut, so it's already in 6.0.0.
Nice!