CRoaring
CRoaring copied to clipboard
Use TLS for big temporary structures
Given that we are already not reentrant (malloc can't be used in signal handlers and we make extensive use of it), nor can I think of reasonable scenarios where we would want that, and that we avoid recursion as much as we can (even if not explicitly a goal, it just happens to be the case), we may avoid many allocations by just using thread local storage globals (or statics if we want cleaner code) for them. I think it's worth benchmarking to see whether it makes sense. I'm thinking mostly of the places where we would store a bitmap for mass operations.
Can you elaborate? We definitively can use thread local globals, but how do you propose coding this up?
I'd need to check the code again to look for places where it would be useful, but I was thinking mostly of lazy operations or those working on multiple bitmaps at a time that may use a container as an accumulator and then convert it to whatever is appropriate depending on its cardinality. The temporary bitmap container could be recycled by sharing a TLS global. I just thought of asking first rather than come up with code that wouldn't have been accepted in any variant (as opposed to regular back and forth to improve the patch).
BTW, sorry for the long delay to answer, I was actually on vacation and wrote the issue just so I wouldn't forget about it.
There is no doubt that we are sometimes limited by memory allocation. I think it is worth pursuing.