NativeJIT
NativeJIT copied to clipboard
Magic buffer sizes
All the sample code contains the same lines with magic numbers:
ExecutionBuffer codeAllocator(8192);
Allocator allocator(8192);
FunctionBuffer code(codeAllocator, 8192);
I have been running into cases with more complex expressions where this is too small. But there is no documentation on how big this should be and how the different buffers fit together. Of course it would be even better if this allocation was dynamic, so I don't have to know...
As a side note: FunctionBuffer
takes an unsigned
as capacity while ExecutionBuffer
and Allocator
take a size_t
which is inconsistent.
Thanks for raising this issue. Magic numbers are indeed problematic.
Historically speaking, eliminating the need for the magic numbers wasn’t a priority in Bing because their use case employed fixed-sized buffers in order to maximize performance. Using fixed-sized buffers allowed them to eliminate all heap allocations on the query processing path. Eliminating heap allocations mainly had an impact on reducing lock contention in a multi-threaded environment. The Bing use case went to extreme lengths to boost performance. As an example, they also used the virtual memory system to allocate guard pages beyond the end of buffers to catch overflows so that they could eliminate bounds checking on each byte write operation.
Most users of NativeJIT don’t need these extreme optimizations and could benefit from buffers that grow automatically as needed. I did a brief investigation and feel it would be feasible to add this support to the codebase without hurting the performance for those who want to use fixed sized buffers.
Before I get into the details, I'll suggest that in many use cases you can just pick magic numbers that are ridiculously large and move on. For example, give each buffer 1Mb. This is likely to work in the vast majority of cases and is still inexpensive on modern hardware.
If you are interested in eliminating magic numbers from the API, the details follow. If you are considering putting together a PR, please check with us first to increase the likelihood that we will be able to take your contribution.
-
Allocator This class is just a naïve implementation of
IAllocator
. One could easily implement anExpandableAllocator
class that would maintain anstd::vector<void*>
of buffers. As each buffer was depleted, a new one would be added. The simplest version of this class would only free buffers in its destructor, so its memory usage would grow over time to some high water mark. A more nuanced implementation might release excess buffers after calls toIAllocator::Reset()
. -
ExecutionBuffer This class is also an IAllocator, but it has the additional requirement that the memory it allocates must have Executable Space Protection disabled. This involves configuring the virtual memory tables, which in some uses cases may be too expensive to do on a per-compilation basis. Those who aren't sensitive to the reconfiguration cost might consider an execution buffer that maps addition virtual memory pages as needed.
One potential issue with this approach is that the
JumpTable
class associates jump targets with absolute memory addresses, so if the buffer were to grow, one would have ensure that the JumpTable entries remained valid. One approach would be to control the virtual memory mapping in such a way that the buffer can grow while retaining its original start address. Another would be to convert the JumpTable to use relative addresses. Another, less desirable, approach would be to relocate the JumpTable after resizing the buffer. -
FunctionBuffer constructor This constructor takes a
capacity
parameter which is passed up the chain of base class constructors to CodeBuffer which uses the value to initializem_bufferEnd
, which is inspected byVerifyNoBufferOverflow()
One would need to modify theEmitBytes()
method to trigger buffer growth as needed, instead of allowingVerifyNoBufferOverflow()
to throw.
With regards to @joto's side note, we're transitioning the code base to use size_t
anytime we want a default size unsigned value. Since our compiler targets x64, we feel that there is no performance penalty to using size_t
in our API (vs unsigned). Once we've finished transitioning to size_t
, the only smaller types will be in places where we require a smaller size (e.g. field of a size-sensitive struct). In general, we're trying to avoid signed types as well unless the use case requires negative numbers. We will transition the code over time, but the priority is low - we're mainly fixing things as we encounter them while implementing higher priority changes. Please feel free to submit PRs on this topic.