dissect.cstruct icon indicating copy to clipboard operation
dissect.cstruct copied to clipboard

Eagerly resolve types and set as attributes on the cstruct object

Open Schamper opened this issue 11 months ago • 1 comments

Currently all type accesses (cs.uint8) go through __getattr__, and if it's not a constant, cstruct.resolve(). This is very slow. We should look into if there are any reasons why we can't resolve types eagerly and set them as instance attributes on the cstruct object. This would make type accesses a lot faster.

Random thoughts:

  • There may be a specific reason we do dynamic resolutions with .resolve(), but how big is that use case? For example, I suppose it would allow changing the typedef of a field and have that dynamically resolve at read, but this is already not possible with compiled structures (where we resolve types at compile time). To be fair, that use case is currently still possible if you opt-out your structure for compilation.
  • Maybe we should only do it for specific types of things, for example enum, flag, struct and constant definitions. And then we can catch typedef with the existing __getattr__. That way we do allow for dynamic type changes, but not for things that are supposed to be static. Since dynamic typedef'ing would be an advanced topic anyway, the intended "workaround" for that would be to use typedef and use the typedef'd name instead of the struct name. "Performance oriented code" could then use the raw struct and enum names for a faster access time (properly written code utilising cstruct already does this to make the loop count in .resolve() as low as possible).

Schamper avatar Jan 28 '25 10:01 Schamper

Some micro benchmarks:

from dissect.cstruct import cstruct

cdef = """
#define X 512

enum MyEnum {
    A,
    B,
    C
};

struct test {
    uint32 a;
};
"""
cs = cstruct()
cs.load(cdef)

Before:

In: %timeit getattr(t.cs, "X")
58.1 ns ± 1.04 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In: %timeit getattr(t.cs, "MyEnum")
227 ns ± 0.912 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In: %timeit getattr(t.cs, "test")
219 ns ± 1.47 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

After:

In: %timeit getattr(t.cs, "X")
31.2 ns ± 0.0331 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In: %timeit getattr(t.cs, "MyEnum")
33.1 ns ± 0.0884 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In: %timeit getattr(t.cs, "test")
31.3 ns ± 0.0571 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Schamper avatar Jan 28 '25 20:01 Schamper