binaryninja-api icon indicating copy to clipboard operation
binaryninja-api copied to clipboard

Templatized types (in some form) in BN's type system

Open alexrp opened this issue 3 years ago • 1 comments

When reverse engineering large C++ binaries, I end up defining a lot of structures like:

struct TArray_FString __packed
{
    struct FString* elements;
    int32_t count;
    uint32_t capacity;
};

struct TArray_HANDLE __packed
{
    HANDLE* elements;
    int32_t count;
    uint32_t capacity;
};

struct TArray_TArray_S1StringDBEntry __packed
{
    struct TArray_S1StringDBEntry* elements;
    int32_t count;
    uint32_t capacity;
};

struct TArray_S1StringDBEntry __packed
{
    struct S1StringDBEntry* elements;
    int32_t count;
    uint32_t capacity;
};

It gets even worse when more complex data structures are in play, such as hash tables. I end up manually doing all the template expansion that the C++ compiler did. And if I make a mistake in my understanding of the data structure, I then have to go back and fix every single instance.

It would be nice if there was a way to do templatized types in BN's type system. Due to the frankly insane complexity of implementing C++ templates (or even just representing them while using Clang for parsing/analysis), I really don't think it has to take the shape of actual C++ templates, but something that achieves the same end result without all the repetition would certainly make life a lot easier. Even just a significantly reduced version of templates that only allows passing complete types as template arguments (so no constants or any other such complexity) would go a very long way.

alexrp avatar Jul 20 '22 18:07 alexrp

This is a cool idea and certainly something I considered when writing the clang parser, but it was not feasible at the time. We'll discuss it internally as we're aiming to improve type support for C++, but this is certainly a ways out.

CouleeApps avatar Jul 20 '22 19:07 CouleeApps

Have you had a chance to consider this one further? I understand it's probably a fairly complex ask, and there's plenty of other things needing attention... just wondering how likely it is for this one to happen at some point.

I've recently run into more and more code in this binary using generic hash maps, and, well, 'manual' template instantiation as mentioned above is no longer really viable for this, as there's like a dozen structures that have to be duplicated for each combination of type arguments.

alexrp avatar Aug 08 '23 05:08 alexrp

Encountered the same issue, and yeah - once you get to the point where the binary uses hash maps, or, heavens forbid, some more complex structures, then stuff gets out of hand really quickly.

Especially hash maps or other semi complex std types are problem imho - they strike a "good" balance between their complexity and usage.

And yes, fully replicating C++ templates is ... probably not a good idea, as that is likely too complex, but there are also other, whackier options that could be explored. For example mostly dumb text teplates could go a long way to solving this issue, as long as they parse their arguments properly.

Roukanken42 avatar Mar 06 '24 06:03 Roukanken42