roaring-node icon indicating copy to clipboard operation
roaring-node copied to clipboard

Should inform V8's GC about externally allocated memory

Open papandreou opened this issue 4 years ago • 8 comments

Hi!

An application I'm working on is a heavy user of this library and big bitmaps -- thanks a lot! I'm experiencing some strange GC behavior that might be explained by roaring-node not letting V8 know about the memory that has been malloced outside the JavaScript heap using isolate->AdjustAmountOfExternalAllocatedMemory(delta). In my case this seems to cause the garbage collector to kick in too late when the process is out of memory, causing very long GC pauses.

I learned about this method from reading some of node.js' own C++ code, eg. the zlib wrapper

Just to test it, I've tried scattering in some super naive and wrong calls:

diff --git a/src/cpp/RoaringBitmap32.cpp b/src/cpp/RoaringBitmap32.cpp
index 3005806..f2948aa 100644
--- a/src/cpp/RoaringBitmap32.cpp
+++ b/src/cpp/RoaringBitmap32.cpp
@@ -169,6 +169,8 @@ RoaringBitmap32::~RoaringBitmap32() {
 
 void RoaringBitmap32::WeakCallback(v8::WeakCallbackInfo<RoaringBitmap32> const & info) {
   RoaringBitmap32 * p = info.GetParameter();
+  v8::Isolate * isolate = info.GetIsolate();
+  isolate->AdjustAmountOfExternalAllocatedMemory(-10000000);
   delete p;
 }
 
@@ -200,6 +202,7 @@ void RoaringBitmap32::New(const v8::FunctionCallbackInfo<v8::Value> & info) {
   if (!instance || !instance->roaring) {
     return v8utils::throwError(isolate, "RoaringBitmap32::ctor - failed to create native RoaringBitmap32 instance");
   }
+  isolate->AdjustAmountOfExternalAllocatedMemory(10000000);
 
   holder->SetAlignedPointerInInternalField(0, instance);
   instance->persistent.Reset(isolate, holder);

When I run a small script that keeps creating garbage RoaringBitmap32 instances (with node --trace-gc), the above makes V8 run mark-sweep garbage collections way more often. I know this doesn't really prove anything, but at least hints that it's needed here.

What do you think?

papandreou avatar Oct 08 '21 23:10 papandreou

I am not aware at all on how this works, would need to do some research. Are you sure this is the issue and not a memory leak somewhere? Could also be the case obviously

SalvatorePreviti avatar Oct 13 '21 20:10 SalvatorePreviti

Hi Salvatore, thanks for replying :)

I can't prove that I don't have a memory leak, but the symptoms fit pretty well with this theory. V8 doesn't know what you're doing outside of the JavaScript heap, so it doesn't realize that a seemingly small RoaringBitmap32 instance is keeping a bunch of "external" memory alive. Thus it'll underestimate the memory pressure and postpone GC.

There's a short explanation here: https://stackoverflow.com/questions/29448918/why-call-adjustamountofexternalallocatedmemory

papandreou avatar Oct 13 '21 20:10 papandreou

Yeah I thought about it and the reasoning makes sense. However is going to be hard to understand how much a roaring bitmap consumes. An empty roaring bitmap consumes few bytes, and the operations on a bitmap might change its size dramatically, in both directions. One option is of course run your node application with --gc and manually call collect in your application when you know you are dealing with a big bitmap until I don't think better about this, and run some tests to be sure there is not a leak in the library somewhere. One option is to check the bitmap size at every operation and change that value the nearest power of two elements in the bitmap, have to check if there are better options

SalvatorePreviti avatar Oct 14 '21 07:10 SalvatorePreviti

Thanks!

For the record I haven't seen any indications of an actual memory leak in this library, it looks like the destructors and everything are wired up correctly to the lifecycle of the JavaScript objects.

As a workaround we deployed a change last week that calls .clear() on all RoaringBitmap32 instances once we're done with them -- as that looked like it was wired up to free all the internal containers. We haven't really seen any dramatic GC pauses since. Although TBH it's not 100% conclusive evidence at this point, as we were also making other changes to the deployment in our effort to mitigate the problem.

papandreou avatar Oct 14 '21 08:10 papandreou

Thanks for reporting this, is very useful, will try to look into that when I have some time. If you could help me verify that it works as expected after would be a good thing

SalvatorePreviti avatar Oct 14 '21 08:10 SalvatorePreviti

Thanks -- and absolutely! Just say the word.

papandreou avatar Oct 14 '21 08:10 papandreou

I just released a new version of the package, https://github.com/SalvatorePreviti/roaring-node/releases/tag/v1.2.0 Computing exactly how much a bitmap consumes is expensive and reduces speed if run for every possible operation. For this reason, I did decide to add 8192 bytes to AdjustAmountOfExternalAllocatedMemory at every RoaringBitmap32 creation and execute a better estimation every time runOptimize or shrinkToFit are called or after serialization and deserialization. I don't feel this is the final solution but is at least an improvement. I verified that objects get reclaimed running the benchmarks.

SalvatorePreviti avatar Jan 07 '22 19:01 SalvatorePreviti

@SalvatorePreviti, thanks a lot for doing that, it sounds like a good compromise!

I will take it for a spin 🏎️

papandreou avatar Jan 07 '22 22:01 papandreou