ghidra icon indicating copy to clipboard operation
ghidra copied to clipboard

Swift projects expand 4-byte values to 8 bytes when building structs

Open nmggithub opened this issue 1 year ago • 19 comments

Describe the bug This is probably just some flag or feature I am unaware of, but I am reversing a binary and trying to add a struct and I experience this behavior. This does not happen in most other binaries I've worked with.

To Reproduce Steps to reproduce the behavior:

  1. Open the Structure Editor and create a new struct
  2. Add an item of the type uint32_t
  3. Observe the popover stating that the size is 4 bytes
  4. Press enter
  5. Observe it being added with a size of 8 bytes.

Expected behavior The item is added with a size of 4 bytes.

Screenshots Screenshot 2024-08-03 at 18 17 03

Attachments N/A

Environment (please complete the following information):

  • OS: macOS 14.6
  • Java Version: 22.0.1
  • Ghidra Version: 11.1.2
  • Ghidra Origin: GitHub releases

Additional context I've tried playing around with alignment, but it doesn't seem to do anything. The struct in the screenshot has the alignment set to 4.

nmggithub avatar Aug 03 '24 22:08 nmggithub

What size does uint32_t take when you use it directly in the program (no struct)? What arch / language is your binary? If you find the uint32_t type in the data type manager tree, where did it come from (is it linked to a data type archive, and what path does the data type live in) and what underlying type is it pointed at?

dev747368 avatar Aug 05 '24 15:08 dev747368

What size does uint32_t take when you use it directly in the program (no struct)? What arch / language is your binary? If you find the uint32_t type in the data type manager tree, where did it come from (is it linked to a data type archive, and what path does the data type live in) and what underlying type is it pointed at?

For uint32_t, there appear to be three data type archives that contain it:

  1. The binary's own data type archive (or at least one named after the program name)
  2. generic_clib_64
  3. mac_osx

I think I generated that third one through I header file I got from GitHub. Note that, again, this isn't happening with every binary. Just this one specific macOS binary I'm looking at.

When using the data type in the program without a struct, those three locations show in the dropdown in the Data Type Chooser Dialog. The latter two say they are 4 bytes long, but the one in the project's archive says it is 8 bytes long. If I select any of them and then mouse over the result, it says the value is 8 bytes long.

EDIT: It appears that mac_osx is actually built in. To note, I did, several versions ago, use the "Parse C Source" method to parse some additional macOS types. I was following the instructions in the README where I got them on GitHub. I'll try to find where that was.

EDIT 2: Ok it was this, I believe: https://github.com/PoomSmart/IDAObjcTypes. I am not sure if this affects anything. Again, it's only happening with this one binary.

nmggithub avatar Aug 05 '24 18:08 nmggithub

Right. Well, the base data type that this typedef is pointing to, in conjunction with this binaries arch is probably the cause.

Some of Ghidra's built-in data types are specific to the arch/compiler that was assigned to the binary during import. If the uint32_t typedef is pointing to one of these (ie. the base built-in data type called int) instead of a statically sized base type (ie. dword), and then you transport that data type from the original context to another binary, and your new binaries arch/compiler spec defines int as 8 bytes instead of 4, you can run into this situation.

From a quick look, I'm guessing your binary is swift which defines int as 8 bytes, but the source of the typedef was created using a 4-byte int compiler spec.

dev747368 avatar Aug 05 '24 19:08 dev747368

Honestly, that Swift theory sounds like it could be it. However, I've reversed this binary before (a previous version) and didn't have this issue. I've also reversed other Swift binaries without issue. Granted all this was also on previous versions of Ghidra.

Regardless, is there a way to tell Ghidra that int is actually 4 bits (and also potentially fix any other base types)?

nmggithub avatar Aug 05 '24 19:08 nmggithub

Regardless, is there a way to tell Ghidra that int is actually 4 bits (and also potentially fix any other base types)?

You can't modify the behavior of the data type called int, but you can modify the typedef to point to something else, like dword.(via a complicated series of steps using the right-click, Replace... action to pick a second typedef that you previously created that was setup the correct way)

If you hover over a data type, the tooltip that pops up should state if its compiler-specific size, or if not mentioned, it will be a statically sized type.

dev747368 avatar Aug 05 '24 19:08 dev747368

If you do Help -> About <program>, what is the value of Compiler ID? Indeed, I made int 8 bytes for Swift programs. I wouldn't have expected that to make uint32 8 bytes as well though.

ryanmkurtz avatar Aug 05 '24 19:08 ryanmkurtz

Compiler ID is indeed: swift. I am still caught up, though, on my ability to reverse previous versions of this binary (and also other Swift binaries) just fine. The more I think about it though, I wasn't really using structs that much in the others.

Another confusing part now is that I was trying to use the structs to define parts of memory, but the sizing was messing with it. Or, in short: there's memory in the binary that's laid out according to a typedef where int is 4 bytes. I'm honestly not sure how that's possible, but it's probably some deep compiler/linker magic.

nmggithub avatar Aug 05 '24 19:08 nmggithub

I think support for swift binaries was added fairly recently to Ghidra, so this data type size mismatch may be a new issue for those binaries vs. the same binary imported using a generic AARCH64 definition.

dev747368 avatar Aug 05 '24 20:08 dev747368

I think support for swift binaries was added fairly recently to Ghidra, so this data type size mismatch may be a new issue for those binaries vs. the same binary imported using a generic AARCH64 definition.

Ok yeah, this makes sense and is probably what's happening. Given that I was able to reverse these binaries just fine before under the generic AARCH64 definition, is it possible to force Ghidra to revert to that? Also, what, if anything, does the new Swift support add? This it the first time I actually have noticed it and it's causing me issues.

nmggithub avatar Aug 05 '24 20:08 nmggithub

If you are okay with re-importing the binary, you can just change the "Language" field before clicking ok. It should pop up a table of arch/compiler combos (and there is a check box at the bottom to let you force something non-recommended).

dev747368 avatar Aug 05 '24 20:08 dev747368

Nice, thank you! You mention "non-recommended", though. is there any recommended way to use typedefs and structs from another compiler spec in binaries like this? Or are cases like this (where a binary has memory laid out based on such a typedef) rare?

nmggithub avatar Aug 05 '24 20:08 nmggithub

The "non recommended" was a reference to the ability to choose an arbitrary cpu arch/compiler during import, even if its incorrect.

As far as recommended ways of reusing type info across arch/compiler specs, dunno.

Ghidra's existing bundled data type archives have this typedef issue, probably because they were generated via parsing .h files. How often these types are used in other type declarations will be up to the source of the imported type info.

You can easily add your own types, even with the same name, but you need to be careful about picking the correct one when using them. You can also overwrite those existing bad types with your own correct definition. (see my previous comment about using the right click, Replace... feature).

If you end up putting some effort into creating type info for your binary, you also may want to save your types into their own data type file so you can reuse it later.

dev747368 avatar Aug 05 '24 20:08 dev747368

I'm only seeing 2 cspec's that have integer_size=8: swift and golang (on 64 bit archs).

Everything else is 4 bytes, except for the obvious 16 bit platform cspecs that have a 2 byte int.

dev747368 avatar Aug 05 '24 21:08 dev747368

Ok so, I just want to clarify my situation as it stands:

Previously, before this Swift support was added to Ghidra, I could load this binary into Ghidra. I could then type a region of memory to a well-known struct which relied on int being 4 bytes long. It worked.

Now, this breaks down because Ghidra assumes int is 8 bytes due to the inferred compiler of the binary. However, the memory region of the binary has not changed. There is still memory that is laid out according to the struct as if int were 4 bytes instead of 8.

Is Ghidra then wrong for inferring that int is 8 bytes long? If this binary was indeed compiled with a compiler in which int is 8 bytes, why is that memory laid out in the binary according to the 4-byte-length layout? Was that potentially some compiler and/or linking magic?

And what would be your recommended solution here? Right now I see two options:

  1. Import as a generic AARCH64 binary, or
  2. Recreate the struct

Option 2 seems like the least invasive option, but the work is non-trivial, as the struct itself actually references several other structs, ones I would likely have to recreate as well. That makes me want to go for Option 1, but I'm not sure if that would break anything else in the program. However, given that I've apparently reversed this binary before as a generic AARCH64, I may be fine.

And, I guess as a final question (repeated from earlier): what did/does this Swift support actually do? Because right now I've only seen it mess up my setup. What are the benefits?

nmggithub avatar Aug 05 '24 21:08 nmggithub

Is Ghidra then wrong for inferring that int is 8 bytes long? If this binary was indeed compiled with a compiler in which int is 8 bytes, why is that memory laid out in the binary according to the 4-byte-length layout? Was that potentially some compiler and/or linking magic?

I'm not knowledgeable about swift, but I am about golang and it may be a good analog.

By default, a RE'd golang binary typically won't benefit from type info imported from a C .h file that might include ints and uint32_t, etc.

However, that could change if the golang binary statically links in a C library. All of the sudden, in 1 binary you've got non-homogeneous definitions (at a source-code level) of what an int is, and ghidra only allows you to specify a single compiler spec. for the entire binary.

re: option 1 or 2... or leave the struct alone and just change the problematic types it references (if they are all typedefs).

dev747368 avatar Aug 05 '24 21:08 dev747368

Thank you for the note. I'll keep this in mind. I think I'm gonna go with option 1 as it just brings me back to what I was doing before. Given that I think we have narrowed down the root cause and a mitigation, I'm going to close this issue. Thank you so much for your help!

nmggithub avatar Aug 05 '24 21:08 nmggithub

@ryanmkurtz Just wanted to ping you here so you can look through the past conversation. It does, indeed, appear to be an issue with the Swift compiler inference. Importing as a default AARCH64 binary works fine, as it appears to just do what it did before.

Indeed, I made int 8 bytes for Swift programs. I wouldn't have expected that to make uint32 8 bytes as well though.

That does seem to be what it's doing. If that's unintentional, you may want to take a look at it. Anyway, I'm still keeping this closed as I have found my mitigation, but it seems that there might be work to be done in regards to Swift support (but I'll leave that to you). Let me know if you have any question for me and I can try and provide answers.

nmggithub avatar Aug 06 '24 09:08 nmggithub

Yes indeed, i will take a look. Swift support adds a couple things. I'll paste this in from the "What's New" we released with Ghidra 11.1:

Initial support for binaries written in the Swift Programming Language has been added. The new support relies on the native Swift demangler being present on the user's system. Swift is automatically bundled with XCode on macOS, and can be optionally installed on Windows and Linux. See the "Demangler Swift" analyzer options for more information. Type information gathered from the demangled Swift symbol names is used to create corresponding Ghidra data types. This currently works for Swift primitives and structures, but more work needs to be done to include classes and other advanced data types. Swift-specific calling conventions are also applied to demangled Swift functions.

The primitive sizes and calling conventions are defined in the x86-64-swift.cspec and AARCH64_swift.cspec text files. At the top of these files is the <data organization> section, where integer_size is defined: https://github.com/NationalSecurityAgency/ghidra/blob/1baf101d43379336d6a9dc0f6da803f946939a40/Ghidra/Processors/AARCH64/data/languages/AARCH64_swift.cspec#L3-L14

Why did I define integer to be 8 bytes? For no other reason that an int in Swift is 8 bytes. I admittedly did not know about this uinit32_t problem though at the time.

What memory are you laying down structures on? Are they things generated by Swift, or some other Objective C or Mach-O thing? In those contexts, I could see an 8 byte integer as being undesirable.

You have the power to modify that .cspec file and make integer 4 bytes. When you restart Ghidra, it should take effect. However, some changes to the Java code would need to take place so 8 byte longs could be used to represent Swift.Int: https://github.com/NationalSecurityAgency/ghidra/blob/1baf101d43379336d6a9dc0f6da803f946939a40/Ghidra/Features/SwiftDemangler/src/main/java/ghidra/app/util/demangler/swift/nodes/SwiftStructureNode.java#L61-L79

This is really the first kind of feedback i've received on the Swift stuff (positive or negative) since its release, so I was expecting to have to adjust things as more test cases rolled in.

As for why Ghidra shows you 4 bytes on hover instead of 8, that seems like a bug to me, personally.

ryanmkurtz avatar Aug 06 '24 10:08 ryanmkurtz

What memory are you laying down structures on? Are they things generated by Swift, or some other Objective C or Mach-O thing? In those contexts, I could see an 8 byte integer as being undesirable.

I'm not 100% sure what generates it, but I know it's a data structure from an old legacy feature in macOS. Probably a C file (not even Objective-C) that's just linked in with the Swift files.

As for why Ghidra shows you 4 bytes on hover instead of 8, that seems like a bug to me, personally.

Sorry, to clarify, hover works fine. It's just that, in any data type dropdown (such as in the Struct Editor, or the code editor) it shows the three copies of uint32_t: program archive, generic_clib_64 archive, mac_osx archive; and only the program one shows the true size of 8 bytes (but if either of the other two are selected it still seems to choose the 8-byte one from the program archive).

Why did I define integer to be 8 bytes? For no other reason that an int in Swift is 8 bytes. I admittedly did not know about this uinit32_t problem though at the time.

I would agree that and Int is 8 bytes on 64-bit platforms, but Swift, to my knowledge, is still designed with 32 bit systems in mind as well:

On 32-bit platforms, Int is the same size as Int32, and on 64-bit platforms, Int is the same size as Int64. Source: https://developer.apple.com/documentation/swift/int

I'm not sure how that affects things.

nmggithub avatar Aug 06 '24 18:08 nmggithub

Hello! I see this is still open. Are there any updates on this? Thanks.

nmggithub avatar Nov 30 '24 05:11 nmggithub

I will inquire about it with the team.

ryanmkurtz avatar Dec 05 '24 12:12 ryanmkurtz

I am going to directly address your Swift issue by making ints 4 bytes instead of 8. When I demangle a Swift.Int, I will just use __int32 or __int64 so there is no conflict. In a perfect world i could define int to be 4 or 8 bytes based on the architecture, but it seems like there are things in the program that just always like ints to be 4 bytes, so best not to mess with that data type. Sound good?

ryanmkurtz avatar Dec 06 '24 13:12 ryanmkurtz

I think that could work, yeah. I'm afraid of potential unintended consequences, but I'm not knowledgeable enough to know what those might be.

nmggithub avatar Dec 06 '24 16:12 nmggithub

The main thing that would get affected is opening an already-analyzed Swift program into the new version of Ghidra that has this change. All the int's would change from 8 to 4 bytes. How would that affect you? Have you moved on from the Swift binaries you've been looking at, or do you have plans to continue to work on them with new versions of Ghidra?

ryanmkurtz avatar Dec 06 '24 18:12 ryanmkurtz

Honestly, I've been working with a lot of binaries, some even with Swift, but only a rare few have actually exhibited this size discrepancy. I haven't taken a look at the compiler on all of them, so it's possible that they all were parsed with the standard compiler. But it's also possible there was something unique about this one binary? I'm not exactly sure. I just know that switching to the non-Swift compiler helped with this one binary (maybe another too, I can't recall).

nmggithub avatar Dec 06 '24 18:12 nmggithub

Can you share the problematic binary? It's a tough call. I have a simple fix, but it will mess up already-imported Swift programs from Ghidra 11.1 and later.

ryanmkurtz avatar Dec 09 '24 20:12 ryanmkurtz

Looking at some more Swift samples, it seems when Objective-C comes into play, or calls into external C library functions, we might apply an int to those function definitions, which are getting applied as 8 bytes. This almost certainly seems wrong to me. There is a Swift type called CInt which maps to Int32. I am going to move forward with the changes I mentioned above.

ryanmkurtz avatar Dec 09 '24 21:12 ryanmkurtz