arcade
arcade copied to clipboard
llvm-symbolizer not present in base queue
Build
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-77578-merge-965165820fec43e19e/JIT.Stress/1/console.f7c5d70b.log?helixlogtype=result
https://dev.azure.com/dnceng-public/public/_build/results?buildId=82793&view=ms.vss-test-web.build-test-results-tab&runId=1731386&resultId=102137&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab
Pull Request
https://github.com/dotnet/runtime/pull/77578
Action required for the engineering services team
Additional information about the issue reported
To triage this issue (First Responder / @dotnet/dnceng):
- [ ] Open the failing build above and investigate
- [ ] Add a comment explaining your findings
In https://github.com/dotnet/runtime/pull/77578, we are trying to generate the crash stacktrace using llvm-symbolizer
. While it is present in containers, the base Linux and macOS queues doesn't have it and we see error using it. See the logs I referenced in the issue. Can we get it and lldb installed on base image?
CC: @hoyosjs @JulieLeeMSFT
Release Note Category
- [x] Feature changes/additions
- [ ] Bug fixes
- [ ] Internal Infrastructure Improvements
Release Note Description
Add llvm and llvm-symbolizer to Ubunut.1804.Amd64 and RedHat.7.Amd64
Hi Kunal, we will get on this. @hoyosjs do you know if this just comes built in with llvm? lldb 3.9 is already being installed on the base ubuntu.1804 queues. Do you need a different version? This is the test queue, so I don't think it would be an issue to upgrade that to something newer, but I'd like to check before making any major changes.
Do you know why 3.9? And llvm sounds good.
I do not know why 3.9. Possibly historic reasons? @MattGal it looks like we set our lldb version to 3.9 back in 2020. Do you know why we're using that?
Edit Oh, actually, we set this in 2019.
Edit: that is also a lie. I am still digging to how long ago we chose 3.9 and never updated it.
Probably for diagnostics...
Yeah. I think that's also what's on the docker images that y'all are using and upgrading to something more modern is also breaking things. I worry updating that will break y'all
@kunalspathak we support several different linux distros, not all of which may have a usable version of llvm-symbolizer. Would it be acceptable if this were only added to Ubuntu Helix machines, or do you need it everywhere? Odds are it's not going to work with some of our more unusual linuxes.
@kunalspathak we support several different linux distros, not all of which may have a usable version of llvm-symbolizer. Would it be acceptable if this were only added to Ubuntu Helix machines, or do you need it everywhere? Odds are it's not going to work with some of our more unusual linuxes.
@hoyosjs - what do you think?
Updating the queues the runtime uses directly would be the first priority:
- Ubuntu.1804.Amd64.Open
- RedHat.7.Amd64.Open
- OSX.1200.ARM64
We'll have to evaluate the helix containers, but those are much easier to update and we've even built the toolset in some of the containers historically.
@MattGal do you know where the symbolizer might not be available? cc: @jkoritzinsky since this might be interesting for your *SAN work
@MattGal do you know where the symbolizer might not be available? cc: @jkoritzinsky since this might be interesting for your *SAN work
Offhand I'd venture it might not be available on old SLES or Mariner. It's one of those things we don't know until we try.
Those don't tend to impact our priority scenario - the PR analysis checks
PR to add them to the two linux based queues: https://dev.azure.com/dnceng/internal/_git/dotnet-helix-machines/pullrequest/27535
I think for OSX, we're going to have to get ddfun involved
Opened https://portal.microsofticm.com/imp/v3/incidents/details/349676322/home to get llvm added to the OSX queue.
(Moved to tracking while we wait for DDFun to update the systems)
(Moved to tracking while we wait for DDFun to update the systems)
@michellemcdaniel do we know the time estimate until DDFun to update the system?
I do not. I know it's been assigned, but I haven't seen any movement on it. I will ping the ICM
In general, it takes 1-2 weeks to get this many systems updated (100ish machines), and next week is Thanksgiving, so it's likely going to be at the longer end of that estimate.
PR to add them to the two linux based queues: https://dev.azure.com/dnceng/internal/_git/dotnet-helix-machines/pullrequest/27535
Does this rollout llvm
to our linux helix queues? I kicked off a run on #77578 that would consume it and still see failure about llvm-symbolizer
not present. See https://dev.azure.com/dnceng-public/public/_build/results?buildId=94545&view=ms.vss-test-web.build-test-results-tab .
We did not have a rollout last week due to the US holiday. The linux changes should rollout this week.
Heads up: DDFun says the OSX queue has been updated to have llvm on them
I tried this out but seems there is still some issue.
Test Infrastructure Failure: System.ComponentModel.Win32Exception (2): An error occurred trying to start process 'llvm-symbolizer' with working directory '/private/tmp/helix/working/ADD7099B/w/A75E0909/e'. No such file or directory
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-77578-merge-7245de3e3bb44b4383/JIT.Stress/1/console.ba62542f.log?helixlogtype=result
@kunalspathak the job was executed in the queue osx.1200.amd64.open but the request was to install llvm in OSX.1200.ARM64 so it is expected for it to not be available in the amd64 queue. In which queue do you need it?
was executed in the queue osx.1200.amd64.open but the request was to install llvm in OSX.1200.ARM64 so it is expected for it to not be available in the amd64 queue. In which queue do you need it?
I just noticed this from @hoyosjs . I think we also need it for OSX x64, right @hoyosjs ?
Updating the queues the runtime uses directly would be the first priority:
- Ubuntu.1804.Amd64.Open
- RedHat.7.Amd64.Open
- OSX.1200.ARM64
Yes, sorry - it would be needed on osx.*.*.open
Results of investigation into creating a brewless LLVM artifact:
LLVM distributes a tarball of binaries for ARM64 macOS but not amd64. The only idea I have is that we could produce our own tar.xz or even our own pkg installer of amd64 darwin binaries (either built from source or brew installed locally) but that would be a massive pain to keep up-to-date since I don't think the vendors have access to mac hardware and I don't know that it's reasonable to have an FTE with a mac build and/or install llvm every three months.
cc/ @Chrisboh
Let's add the install of this as part of the work DDFun has to do manually to setup a machine. @hoyosjs / @kunalspathak do understand that any time we need to change / update this it will take a considerable amount of time to change. Do you think this is something that will need to change often?
Barring format changes on apple's behalf, I don't expect this to change often at all.
Created https://portal.microsofticm.com/imp/v3/incidents/details/358905819/home to have DDFun do this for all mac open queues.
@jonfortescue should this be closed and/or superseded by @ulisesh 's FR work?