spark-rapids icon indicating copy to clipboard operation
spark-rapids copied to clipboard

[FEA] Make SpillableColumnarBatch inform Spill code of actual usage of the batch

Open revans2 opened this issue 3 years ago • 1 comments

Is your feature request related to a problem? Please describe. Currently SpillableColumnarBatch does some things that are far from ideal for the spill code. When you get a batch it will lock the underlying spill id, create the ColumnarBatch, and then release the spill id. After that the regular reference counting is used to keep the buffers that make up the ColumnarBatch around until they are no longer needed.

The problem with this is that the RapidsBufferCatalog thinks that all of the buffers are spillable, even when reference counts prevent them from actually being freed. Ideally as long as someone sill holds a reference to the underlying buffer we would not release the spill id. I think we can do this, but we would need to add a layer of indirection at the DeviceMemoryBuffer layer. We could create a new SpillableMemoryBuffer that would hold both a DeviceMemoryBuffer and the buffer/spill id. It would have a set of reference counts separate from the DeviceMemoryBuffer. When the SpillableMemoryBuffer reaches a reference count of 0, then it would release the spill id. Then the spill code would be allowed to actually free the underlying DeviceMemoryBuffer when spilling.

revans2 avatar Sep 14 '22 20:09 revans2

Discussed with @revans2 and @jlowe today on this. @revans2 proposed an idea that would add callbacks likely in DeviceMemoryBuffer that could be used by the spill framework to register a function that would actually mark the buffer as spillable (e.g. release the ref count in the spillable framework).

Another topic mentioned is the potential for collisions where the same buffer (the same contig split buffer) has been registered twice with the spill framework. Say an upstream exec makes a buffer spillable, and then the same buffer is returned as part of next(), only to be added again to the spill framework. Since these buffers have IDs, perhaps there could be some smarts built into the catalog to deal with this to de-duplicate redundant registrations.

abellina avatar Oct 17 '22 15:10 abellina

I am interested in picking this up after my current tasks as this is related to the "maximum live memory" question we are trying to answer with changes to cuDF and plugin.

abellina avatar Nov 02 '22 21:11 abellina