Clarification on the Purpose of Red-Black Trees in Box64
I have noticed that in custommem.c, memory ranges mapped by mmap are simultaneously recorded and removed in multiple red-black trees.
Each node encapsulate a memory range along with detailed information about that range.
My question pertains to the necessity of using three distinct red-black tree nodes to record the same memory range.
Is this approach taken solely to maintain three separate pieces of information related to each memory range?
If that's the case, could a bitwise operation be implemented to consolidate the information, potentially reducing the number of trees required?
While mapmem and mapallmem could certainly be fused without too much impact, I think memprot should still be separated because it's changed often, and on a fine granularity. It will be highly fragmented.
On the the other hand, having everything in the same map could make gatthering all needed information on a address space in less calls, but that would require some changes in the code.
For now, everything is separate because it's easier to manipulate that way.
I've noticed that the functions rb_get and rb_get_end are frequently used in Box64, for example in SDL2 functions, as demonstrated in projects like mado :
| Window Event | rb_set | rb_unset | rb_get | rb_get_end | rb_get_righter |
|---|---|---|---|---|---|
| No Action (Only Close Window) | 6261 | 332 | 37695 | 14404 | 6 |
| Drag Window (Around the Border Three Times) | 6568 | 342 | 45753 | 15068 | 6 |
| Mouse Click (Ten Times) | 6569 | 342 | 40242 | 15067 | 6 |
| Mouse Move (Around the Border Three Times) | 6420 | 338 | 42496 | 14735 | 6 |
| Calculator Operations (Three Times) | 6627 | 343 | 43870 | 15185 | 6 |
However, I've observed that the size of a RB-tree node is too large that it cannot fit more than one node in a cache line, which makes it cache-unfriendly. To enhance performance, I am considering modifying the node structure.
Before proceeding, I would like to clarify the purpose of using Red-Black Trees in Box64.
From my observations, it seems that Red-Black Trees are used primarily to manage the mapped memory ranges. When a range is mapped, it is either added as a new node or extends an existing node in the red-black trees ( mapmem, mapallmem, and memprot). Upon node creation, the function rbtreeMalloc maps a range of memory for the node and allocates additional memory for two blockmark_t structures, which define relationships between different memory blocks. This mapped range is also added to the read-block tree.
Could you please confirm if my understanding is correct and provide further details if there are additional considerations or functionalities of Red-Black Trees in Box64 that I should be aware of?
Thank you so much for your assistance.
The main functionality of the red-black trees is to provide a fast, cheap way to map ranges of addresses to a number and acccessing by providing an address in this range. The only functionalities required are:
- Data structure creation/destruction;
- Query an address, get an
uint32_t(rb_get); - Query the end of the range containing a given address (
rb_get_end); - Set an address range (
rb_set); - Unset an address range (
rb_unset); - Query the first address after all set addresses (
rb_get_righter). There is alsoprint_rbtreewhich allows for easy-ish debugging.
The mental model is:
Address: 0 123 456 789 1240 (end)
+---------+---------+---------+---------+---------+
| #unset# | 1 | #unset# | 2 | #unset# |
+---------+---------+---------+---------+---------+
get 200: ^ 1
get_end 200: ^ 456
get_end 100: ^ 123
rb_get_righter: ^ 1240
The exact data structure itself is not really important. In fact, before the current implementation of red-black trees, it used to be (nested) arrays, but these were too memory expensive. (However, note that rb_(un)set may intersect with range boundaries in a lot of ways.)
EDIT: Also, rbtreeMalloc only allocates space for a new node. rbtreeMalloc is in fact just #defined to customMalloc since the functions may be called in signal handlers, so the functions must also be signal-safe, and an exception may be raised in eg rb_set, so these functions should also be reentrant (though the current implementation is not and seems to work well enough).
Thanks for your explanation.
Yes, the size of an rb-tree node is indeed 48 bytes, which does not include the additional memory required for two blockmark_t structures.
But I don't think the exact data structure is unimportant. Since common cache line sizes are 64 bytes, reducing the size of the node structure can increase the number of nodes that fit in a cache line, making it more cache-friendly.
To achieve this, I want to make the following modifications:
- Fuse
mapmemandmapallmemto reduce the total number of nodes (or explore other ways to reduce the number of red-black trees). - Remove the parent pointer, similar to the red-black tree in jemalloc, which can save 8 bytes in the node structure.
- Use the LSB of the
leftpointer to storemeta(the actual pointer isleft & ~1UL), which can save 1 byte.
However, despite the aforementioned methods, the size of the node remains too large for the cache line. Could you please advise me on any strategies I can employ to reduce the size of the node?
Additionally, what is the difference between a dynablock (native instruction block) and a block? Finally, I want to confirm whether the data for mapallmem and mmapmem refers to the same thing (allocated). Thank you!
@devarajabc are you still looking for clarification on this? I know you've had a few different GitHub Issues with similar topics.