runtime
runtime copied to clipboard
WIP: Initial provisions for semantic Crossgen2 PDB validation
This Quality Week work item is motivated by my findings from a few months back that Crossgen2 PDB generator had been producing bogus symbol files for almost half a year due to a trivial bug and no testing in place was able to catch that. This change adds initial provisions for semantic PDB validation.
In this initial commit I'm adding a new managed app PdbChecker that uses the DIA library to read a given PDB and optionally check it for the presence of given symbols. In parallel I'm making test infra changes that enable selective PDB validation support in individual tests including checks for the presence of expected symbols.
This change by itself introduces rudimentary PR / CI validation of PDB files produced by Crossgen2. As next step I plan to introduce additional provisions for running this logic for all tests to be added to one of the Crossgen2-specific pipelines, and validation of PDBs produced during compilation of the framework assemblies.
Thanks
Tomas
Tagging subscribers to this area: @hoyosjs See info in area-owners.md if you want to be subscribed.
Issue Details
This Quality Week work item is motivated by my findings from a few months back that Crossgen2 PDB generator had been producing bogus symbol files for almost half a year due to a trivial bug and no testing in place was able to catch that. This change adds initial provisions for semantic PDB validation.
In this initial commit I'm adding a new managed app PdbChecker that uses the DIA library to read a given PDB and optionally check it for the presence of given symbols. In parallel I'm making test infra changes that enable selective PDB validation support in individual tests including checks for the presence of expected symbols.
This change by itself introduces rudimentary PR / CI validation of PDB files produced by Crossgen2. As next step I plan to introduce additional provisions for running this logic for all tests to be added to one of the Crossgen2-specific pipelines, and validation of PDBs produced during compilation of the framework assemblies.
Thanks
Tomas
| Author: | trylek |
|---|---|
| Assignees: | - |
| Labels: |
|
| Milestone: | - |
Fixes: https://github.com/dotnet/runtime/issues/13134
@jkoritzinsky - my thinking has been that adding the optional PDB check step after Crossgen2 compilation gives us more freedom w.r.t. extent of the testing. AFAIK the ILCompiler tests are typically small unit tests targeting particular components of the compiler like the dependency analyzer or type system using a handwritten set of individual cases to test; by being able to run the PDB check on arbitrary compiled tests and ultimately the framework we should be able to test many more PDBs that more closely resemble end customer apps including a wide variety of constructs like different generic instantiations and a broad range of executable and PDB sizes. @cshung is working on a similar mechanism using R2RDump to disassemble the various Crossgen2-produced binaries to test its robustness. If you think I misunderstood your original suggestion or if you have additional arguments to support it, please elaborate.
FYI, my proposed change to add r2rdump is already out for PR as part of the hot-cold splitting upstreaming change here. This is considered part of the quality week work.
In particular, I changed src/tests/Common/CLRTest.CrossGen.targets to include an execution of the R2RDump tool, as well as the ValidateRuntimeFunctions method in src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/ReadyToRunReader.cs to check for some invariants that the crossgen2 compiler should guarantee. (e.g. the runtime functions must be sorted by RVA in the runtime function table)
The idea is that when some invariants ca statically checked, checking them statically is way cheaper than executing the tests many times and hoping that when the list is not sorted then sometimes the binary search might fail.
@eugeniopena who authored the ValidateRuntimeFunctions method.
I'm just wondering how much benefit we're actually getting by just throwing new phases and types of testing at the src/tests suite instead of deliberately authoring tests to cover the various scenarios we support. Especially with a tightening test budget, we should be more deliberate about our testing instead of just throwing everything at the wall and seeing what sticks.
I get that this is the easiest way to cover a wide range of scenarios, but even for that, we know the src/tests tree doesn't cover everything. The JIT team runs the libraries tests against JitStress to catch bugs that the src/tests tree doesn't. Additionally, each new test variation that we add to the runtime test scripts adds more complexity and makes them more difficult to understand to anyone that we want to onboard to the infra team. I'm fine with us starting here, but I strongly feel that we should add deliberate testing instead of just pointing at these tests and hoping that we get enough coverage that way.
I also think that adding unit tests for "did we order this collection correctly" would be a better (and likely cheaper over time) test mechanism for the hot-cold splitting work as well. The asserts in ValidateRuntimeFunctions are good, but I feel that writing tests that explicitly test that things are ordered correctly would be a better test than just throwing things at the src/tests tree.
Thanks Jeremy for your additional feedback. I think I mostly agree with you on the following accounts:
- runtime tests have many specifics and limitations and an expectation that just making them pass in a new build mode amounts to its full validation is very naive;
- arbitrarily adding new build modes for the runtime tests exhibits a form of exponential explosion that makes our lab budget suffer;
- making the test script logic more complicated is in general a pain for ramp-up and maintenance.
My personal feeling regarding the PDB testing from these perspectives is as follows:
- While I completely agree that runtime tests are by no means a fair proxy for customer projects, I sincerely doubt there is an easy way to put together something more comprehensive in terms of the ILCompiler tests; we're not targeting a particular regression in PDB generation for which we could create a directed unit test, as a first step we want to improve our overall confidence that PDBs are not broken altogether (as was the case for about five months earlier this year without anyone even noticing);
- I'm certainly not suggesting we should double our PR testing with these new modes; as you can see in this PR, for the purpose of PR / CI testing I'm proposing a surgical addition of this logic to the one
crossgen2smoketest, we can expand it to several more tests but that's about it. Beyond that I've been thinking about a broader albeit less frequent validation as part of some of the crossgen2-specific pipelines - according to the measurements Jared recently published, these are small potatoes compared to today biggest Helix / Azure consumers; - As you can see yourself, the actual change to the test build scripts is pretty minimal in comparison with their current content, if we decide to clean them up and simplify them I suspect there are more complex places where we could start, I did some of that in the past (e.g. by unifying much of the logic in build.cmd/sh by moving it to the common proj file).
For now I'm inclined to continue working on this PR unless it turns out to be somehow detrimental to overall lab perf (which I deem improbable) as it has the potential to introduce at least some PDB testing to our system that is long overdue. I can try to experiment with leveraging ILCompiler tests for this purpose but I'm afraid it's going to be at least as complicated as this proposed change for the following reasons:
- The hypothetical ILCompiler test will need the same underlying infrastructure i.e. Dia2Lib et al to parse the produced PDB;
- IIRC in today lab we're running the ILCompiler test suite on Linux which won't work as PDB is a Windows-specific technology so we'd need to tweak the lab to cater for that somehow;
- If I'm not mistaken, today the ILCompiler tests run on the AzDO build machines that don't support coverage for all four architectures (x64, x86, arm, arm64).
Thanks
Tomas
That all makes sense, and since this isn't adding a new pipeline or scenario flavor, I think that this is okay to add. Your reasoning around hooking up the underlying platform logic for DIA also is quite convincing. Let's go with this for now and we can look back in the future to decide if changing this is worthwhile.
@trylek, this is a year old. Should it be closed?
Sure, makes sense, I'll reopen this once I get to figuring out the problem that makes it still fail in the lab.
For the "problem in the lab", according to my discussion with @brianrob we shouldn't be using CLSID-based component creation for DIA, instead we should manually use its public exports similar to what we do in diagnostic tests, cf.
https://github.com/microsoft/perfview/blob/afd33a906961167ba995a6d71f35de9ea6a2bbdd/src/TraceEvent/Symbols/NativeSymbolModule.cs#L1794
using a static copy of dia2lib.dll.
/cc @ivdiazsa