[FEA] cudf::column needs a set_stream() function.
cudf::column uses rmm::device_buffer for internal storage. rmm::device_buffer internally stores the stream it was created on. This can create issues when creating columns on temporary worker-style streams and then returning them to a primary stream. When the column gets destroyed, rmm throws exceptions about invalid device contexts.
A set_stream(rmm::cuda_stream_view) function that recurses through all children would make it easier to hand off columns between streams.
We initially encountered this issue with cudf::io::column_buffer which has a similar problem but it will also be an issue for columns themselves.
The rmm::device_buffer has a set_stream method. This proposed API would call that method of the underlying buffer.
This seems reasonable from what I can tell, but its utility (transferring ownership between streams) will be limited until we expose streams publicly, aside from the internal I/O methods that @hyperbolic2346 is working on. Perhaps we should wait to add this to the public API until we have a broader discussion of adding streams to libcudf APIs?
I think that's totally fine. The stuff Mike is working on is just experimental stuff as is, and doesn't actually require this from column itself.
I added a PR showing what we are talking about here. I don't know if this is the best way to expose this, but it hopefully clears it up enough to have a solid conversation about it.
@nvdbaranec @hyperbolic2346 now that #13744 is closed, do you want to revisit this issue?
https://github.com/rapidsai/cudf/issues/19118 was opened as a duplicate of this issue. @tgujar Feel free to share more about your use case here.
Our use case is also for handing off columns between streams.
task t1 {...};
task t2 {...};
rmm::cuda_stream s1, s2;
{
std::shared_ptr<cudf::column> c {...};
t1.add_dependency(c);
t2.add_dependency(c);
}
t1.execute(s1); t2.execute(s2);
But I find that for this we cannot guarantee correct lifetime management without the use of events since column c on GPU should be freed only after kernels in both t1 and t2 are done utilizing it. I think a good solution for us is to just have a wrapper around cudf::column and wait on all dependent events before freeing.
If we know of a better solution for this, I would be happy to learn more!