struct_layout icon indicating copy to clipboard operation
struct_layout copied to clipboard

A GCC plugin to dump the final layout of a struct.

A GCC plugin to dump the final layout of a struct and all types it references.

I started this project to support my Linux Kernel MicroPython port <https://github.com/Jongy/micropython/blob/linux-kernel/ports/linux-kernel/README.rst>_

  • I wanted to have an easy, Pythonic way to access kernel structures. That's why even the dump format is Python :)

This project consists of the GCC plugin itself under gcc_plugin and of Python heplers under python that act as "data accessors" using the the plugin output.

You can use the plugin easily without the accessors though - it just was my specific purpose, the plugin is quite useful by itself.

Build

Just hit make.

You can build in debug mode with make DEBUG=1; You'll get debugging information printed to stderr (basically the internal GCC tree object of every field processed).

Quick example

There's test_struct struct in tests/test_struct.c. This struct exploits many of the peculiarities allowed in struct definitions. You can check it out, then hit make run to dump that weird struct, and see how different fields ended up in the generated dump.

Using the plugin

On a specific struct my_struct from a specific file myfile.c:

.. code-block:: bash

$ gcc -fplugin=./struct_layout.so -fplugin-arg-struct_layout-output=layout.txt -fplugin-arg-struct_layout-struct=my_struct myfile.c -c

You'll have your results in layout.txt.

You can omit -fplugin-arg-struct_layout-struct to dump all defined structs instead (all structs defined in your C file, and all structs defined in all headers included)

The output

Output is printed as Python objects, for easier handling later.

A dictionary is printed, with Struct objects created for each struct / union. There's no distinction between structs and unions in this aspect - unions will simply have different offsets for their fields.

The object holds the name and size of the struct/union, plus a dictionary of the fields. The dictionary maps field names to tuples of (offset, field type). For unions, the offset is always 0.

The objects & field types are defined in python/fields.py.

All types have a total_size attribute, with their total size in bits. Other attributes vary between field types:

  • Scalar - scalars, they also have their basic type, like int or char or unsigned long int and a boolean sign field (True signed / False unsigned)
  • Bitfield - used for bitfields, these have the number of bits they occupy and a sign field.
  • StructField - struct/union fields, these have the struct name they are referencing. If the field is based on an anonymous struct, then its Struct object itself is given.
  • Pointer - for all types of pointers, these have their "pointee" type, which may be e.g Scalar or another Pointer.
  • Void - void type, for example in void *. This has size 0.
  • Function - pointee type in case of function pointers. This has size 0.
  • Array - for arrays, these have the number of elements and the type of each element ( similar to the pointee type of Pointer)

For example, the struct struct s { int x; unsigned char y; void *p; }; on my x86-64 evaluates to:

.. code-block:: python

structs = {
's': Struct('s', 128, {
    'x': (0, Scalar(32, 'int', True)),
    'y': (32, Scalar(8, 'unsigned char', False)),
    'p': (64, Pointer(64, Void())),
}),
}

For a Linux kernel struct

As I said, I originally intended this for Linux so it must be easy to generate the structs here :)

To generate for a specific struct:

.. code-block:: bash

$ python linux/dump_structs.py layout.txt --struct task_struct --header linux/sched.h

You can set the KDIR environment variable to run against a specific kernel tree (by default, runs against your local).

.. code-block:: bash

$ KDIR=/path/to/kernel python dump_struct.py ...

To dump all structs (based on a set of headers I've collected in include_all.c) you can run:

.. code-block:: bash

$ python linux/dump_structs.py all.txt

Structs missing in output

When including headers to dump their defined types, you may see some structs missing from the output (although they are fully defined in the headers). Apparently GCC doesn't complete the processing of structs that have only a typedef name until they are used at least once (structs of the format typedef struct { ... } ..;). I didn't verify it in GCC's code though. Thus, the emitted event for finished types is not generated for them, and the plugin doesn't know of them.

A quick workaround for this problem: define a dummy, named struct referencing the types you want in the dummy .c file you're handing to GCC.

Using the accessors

Paired with the structs generated by the plugins, the accessors allow very convenient handling of structured data in Python code.

Basically you need to provide the base memory accessors (functions that access read/write a u8/u16/u32/u64 pointer) and the accessors handle the rest (fields, pointers, arrays, bitfields, signedness, ...)

You can see how test_accessor.py does it.

Tests

This was tested on GCC 7.4.0, GCC 9.2.0, GCC 10.2.0. Oh, and Python 3, of course.

.. code-block:: bash

$ make test