ghidra icon indicating copy to clipboard operation
ghidra copied to clipboard

Fix PDB parsing of recoverable malformed datatypes

Open alandtse opened this issue 1 year ago • 1 comments

MSVC sometimes generates pdbs where there is an implied datatype such as pointers * or arrays [16] with the datatype missing or blank. While the actual datatype is unknown, Ghidra has undefined to cover this use case.

This avoids an error on PDB import which would have a cryptic message: "Symbol list must contain at least one symbol name!" without any info on what caused the issue.

alandtse avatar Jul 22 '24 08:07 alandtse

I'd like to better understand where you are seeing these types. I first want to confirm that we are not doing something incorrectly further upstream, and I also want to see how these types are manifested from the perspective of the PDB Universal Analyzer.

With my basic understanding of the PDB MSDIA processor, your logic seems reasonable, but I'd like to make sure the fix shouldn't be elsewhere and I'd like to see these types first-hand.

Can you point me to a PDB that has this issue. Do you know where you are seeming them; e.g., as a the type for a symbol, as the type of a local variable, as the reference of a typedef?

Can you point out a line in a pdb.xml file and specify the node nesting?

ghizard avatar Jul 23 '24 11:07 ghizard

Can you point me to a PDB that has this issue. Do you know where you are seeming them; e.g., as a the type for a symbol, as the type of a local variable, as the reference of a typedef?

Please see attached zip for the xml after using the CreatePdbXMlFileScript. It's built from this project under Debug build settings. CommonLibVR.pdb.xml.zip

Can you point out a line in a pdb.xml file and specify the node nesting?

See lines 22289 and 22290.

        <datatype name="std::_String_val&lt;std::_Simple_types&lt;char8_t&gt; &gt;::_Bxty" kind="Union" length="0x10" >
            <member name="_Bxty" datatype="void *" offset="0x0" kind="Unknown" length="0x0" />
            <member name="~_Bxty" datatype="void *" offset="0x0" kind="Unknown" length="0x0" />
            <member name="_Buf" datatype="[16]" offset="0x0" kind="Member" length="0x0" />
            <member name="_Ptr" datatype=" *" offset="0x0" kind="Member" length="0x0" />
            <member name="_Alias" datatype="char[16]" offset="0x0" kind="Member" length="0x0" />
            <member name="__vecDelDtor" datatype="void *" offset="0x0" kind="Unknown" length="0x0" />
        </datatype>

You can also search using a regex for other instances with either a * or [ under the datatype attribute.

alandtse avatar Jul 24 '24 02:07 alandtse

@alandtse I'm having difficulties doing the build. I'd like to figure this out, as I had attempted and failed to use vcpkg for a different build a couple years back. I've tried a lot of variations such that I'm getting pretty confused.

Trying to follow the "Building" section of https://github.com/alandtse/CommonLibVR?tab=readme-ov-file. I was presuming that vcpkg was going to pull in the build dependencies, so have not done anything with them. Not sure if I completed "Ensure Development requirements are completed." And was hoping I'd see messages telling me that I didn't complete them, but I think I'm getting stuck earlier than that.

I have VS Community 2022. Opted to use the version of vcpkg that came with it, but later tried doing the separate vcpkg-2024.07.12 as a variation when having difficulties

Also had installed cmake-3.30.2-windows-x86_64.msi

For each vcpkg, tried: vcpkg integrate install

On my last go-round, when trying: cmake --preset vs2022-windows-vcpkg "-DCMAKE_TOOLCHAIN_FILE=C:/Program Files/Microsoft Visual Studio/2022/Community/VC/vcpkg/scripts/buildsystems/vcpkg.cmake" I get issues such as:

CMake Error at C:/Program Files/Microsoft Visual Studio/2022/Community/VC/vcpkg/scripts/buildsystems/vcpkg.cmake:899 (message): vcpkg install failed. See logs for more information:

Log says: error: this vcpkg instance requires a manifest with a specified baseline in order to interact with ports. Please add 'builtin-baseline' to the manifest or add a 'vcpkg-configuration.json' that redefines the default registry.

Tried getting some ideas by watching the tutorial on your site too and looked at issues where someone mentioned the CMAKE_TOOLCHAIN_FILE arg.

Feeling a bit daft.

ghizard avatar Aug 05 '24 23:08 ghizard

You're not the first person to complain about vcpkg. I went ahead and added a baseline. But the main issue with vcpkg is making sure you run the bootstrap.

The other thing may be that vcpkg changed their default env variable at some point. They now default to VCPKG_ROOT. I just updated the project to use that instead of VCPKG_INSTALLATION_ROOT. I believe a lot of people just add both since the community tends to use both path variables depending on when you installed vcpkg but it will throw off people following the new instructions.

~Oh also, follow the path for SkyrimVR. No one is using it for regular Skyrim so the build path looks broken now.~ Edit: Either should work now.

alandtse avatar Aug 06 '24 05:08 alandtse

@alandtse

Given your changes, I was able to successfully build the targets. I converted the PDBs to XML and can essentially confirm what you are seeing, though mine is slightly different.

        <datatype name="std::_String_val&lt;std::_Simple_types&lt;char8_t&gt; &gt;::_Bxty" kind="Union" length="0x10" >
            <member name="_Bxty" datatype="void *" offset="0x0" kind="Unknown" length="0x0" />
            <member name="~_Bxty" datatype="void *" offset="0x0" kind="Unknown" length="0x0" />
            <member name="_Buf" datatype="&lt;NoType&gt;[16]" offset="0x0" kind="Member" length="0x0" />
            <member name="_Ptr" datatype="&lt;NoType&gt; *" offset="0x0" kind="Member" length="0x0" />
            <member name="_Alias" datatype="char[16]" offset="0x0" kind="Member" length="0x0" />
            <member name="_Switch_to_buf" datatype="void *" offset="0x0" kind="Unknown" length="0x0" />
            <member name="__vecDelDtor" datatype="void *" offset="0x0" kind="Unknown" length="0x0" />
        </datatype>

Note that the "length" field is added (by someone working on stuff here), but also the _Buf and _Ptr members have a <NoType> underlying type for the array and pointer.

I also ran the PDB though the PDB Universal workings and can see new distinct primitive types, and based upon both the close proximity of the primitive code numbers to char16_t and char32_t, as well as the name of the template argument for this type, I'm nearly 100% sure that the underlying primitive type is a char8_t as described here: https://stackoverflow.com/questions/57402464/is-c20-char8-t-the-same-as-our-old-char

I haven't tried registering the msdia140.dll that comes with Community 2022 and not sure that I want to mess with it on this system that other people also use. Do you know if you registered the version that comes with 2022 before you did the pdb-to-xml conversion? If you have, then we are basically stuck and will probably just use your PR (and might so regardless), but it would be nice for people to know if fixing the front end will yield the better results.

I will see about updating the PDB Universal code.

ghizard avatar Aug 06 '24 16:08 ghizard

I haven't tried registering the msdia140.dll that comes with Community 2022 and not sure that I want to mess with it on this system that other people also use. Do you know if you registered the version that comes with 2022 before you did the pdb-to-xml conversion? If you have, then we are basically stuck and will probably just use your PR (and might so regardless), but it would be nice for people to know if fixing the front end will yield the better results.

Sorry, I'm not sure what you mean by registering. I've been building with VS Community 2022 for a couple years now, but not sure I did any registering outside of whatever the installer does.

alandtse avatar Aug 06 '24 17:08 alandtse

The pdb.xml file is created by a the Ghidra pdb.exe program on a Windows system using the msdia140.dll. In order for the program to find the dll, it must get registered with the OS.

I'm not sure which "installer" you are talking about... VS Community 2022 or perhaps you used an installer to install Ghidra. The registering step I'm talking about is specifically for Ghidra's pdb.exe to be able to get the dll.

Whether you performed the steps or not (something must have), we specify the registration steps in the ./Ghidra/Features/PDB/src/global/docs/README_PDB.html document.

There are lots of copies/versions of msdia140.dll found in various VS folders. If the one that got registered was for VS 2017 or 2019, it might not have the ability to recognize the char8 primitive type from C++20. I'm suggesting that the one I see (there might be more than one) that comes with VS Community 2022 might be able to handle the char8 and output it correctly.

ghizard avatar Aug 06 '24 19:08 ghizard

Registered msdia140 from my vs2022 community per the instructions and did a new export. Still seeing the behavior of the malformed type.

        <datatype name="std::_String_val&lt;std::_Simple_types&lt;char8_t&gt; &gt;::_Bxty" kind="Union" length="0x10" >
            <member name="_Bxty" datatype="void *" offset="0x0" kind="Unknown" length="0x0" />
            <member name="~_Bxty" datatype="void *" offset="0x0" kind="Unknown" length="0x0" />
            <member name="_Buf" datatype="[16]" offset="0x0" kind="Member" length="0x0" />
            <member name="_Ptr" datatype=" *" offset="0x0" kind="Member" length="0x0" />
            <member name="_Alias" datatype="char[16]" offset="0x0" kind="Member" length="0x0" />
            <member name="__vecDelDtor" datatype="void *" offset="0x0" kind="Unknown" length="0x0" />
       </datatype>

alandtse avatar Aug 07 '24 06:08 alandtse

Thanks for the feedback on that. I checked your branch point and also looked at history within our source files, and I'm still not sure why you get a blank type whereas I'm seeing the <NoType> underlying typename. Will likely take your commit here and I will do some other fixups for <NoType> and char8_t across PDB and Demangler.

ghizard avatar Aug 07 '24 14:08 ghizard