[FEA] Make `column_view` _actually_ be a view
Is your feature request related to a problem? Please describe.
column_view is so named because it is supposed to be a non-owning, "view" type. However, this is currently a half-truth because column_view is an owning class!
column_view stores a vector<column_view> internally for it's children. Since vector is an owning type, that makes column_view an owning type.
This has several ramifications.
First of all column_view is not trivially copyable (which view types should be!).
The larger problem is more insidious. The fact that column_view contains a vector object is the ultimate cause of the many compiler errors that libcudf developers have seen about calling a __host__ function from a __host__ __device__ context. The full causal chain is complicated.
As @harrism described in https://github.com/rapidsai/rmm/pull/312:
Unfortunately, NVCC implicitly adds host device specifiers to explicitly defaulted functions that are called from both host and device (or host device) functions. In libcudf, the type_dispatcher uses a host device function, so the above changed resulted in compiler errors since the default compiler-generated constructor necessarily invokes a host-only function.
In short, column_view has defaulted ctors. Since these ctors can be used within the context of the type_dispatcher (which is __host__ __device__), nvcc will implicitly add __host__ __device__ to these ctors. These defaulted column_view ctors will invoke the member vector ctors (which are intrinsically __host__ only). Thus, we end up with trying to call a __host__ function (vector ctor/dtor) inside a __host__ __device__ function (column_view ctor/dtor).
Describe the solution you'd like
Remove the vector<column_view> from column_view and instead replace it with column_view* children and size_type num_children:
class column_view{
column_view(...., column_view * children, size_type num_children);
private:
column_view * children; // pointer to array of child `column_view` objects
size_type num_children;
};
This will make column_view a true "view" type by being trivially copyable and eliminate any possibility for the host/device errors we've encountered numerous times in the past.
But what about the children!?
The catch is, someone has to own the children column_view objects, i.e., the children pointer in column_view has to point to column_view objects that are constructed/owned by someone else. There's two scenarios we have to consider:
column_views that viewcudf::columnobjectscolumn_views that do not viewcudf::columnobject (e.g., viewing a Python owned column)
In 1. the fix is pretty easy. We can just add a vector<column_view> to cudf::column:
class column{
private:
vector<unique_ptr<column>> children; // here's the owning child objects
vector<column_view> child_views; // non-owning views to the same children
public:
operator column_view(){
// pass pointer to the `child_view`s data when converting to a `column_view`
return column_view{..., child_views.data(), child_views.size()};
}
};
Situation 2. shouldn't be too onerous either. The caller was already required to construct a std::vector<column_view> to pass the children to the column_view ctor. However, instead of copying that vector, we just take a pointer to it's contents. This does add the added responsibility of the user to ensure that the column_view does not outlive the vector<column_view> for the children. @shwina correct me if I'm wrong, but I don't anticipate this should be a problem from the Python/Cython side nor require significant changes?
@shwina correct me if I'm wrong, but I don't anticipate this should be a problem from the Python/Cython side nor require significant changes?
We currently release the buffers from a Column and those control the memory lifetime directly via smart pointers. For creating column_view instances, we currently have this function: https://github.com/rapidsai/cudf/blob/branch-0.13/python/cudf/cudf/_libxx/column.pyx#L335-L369
It sounds like we'll need to change the vector[column_view] to use a column_view* which we then no longer know the lifetime of. This is a bit of a pain for us on the Cython side, but we have a couple of options from my perspective:
- We add an attribute to our
Columncython class that keeps a reference to avector[column_view]to keep it alive. Note: this doesn't work when children columns can have children unless we recursively add all of the childrencolumn_viewto the vector. - We create a
ColumnViewcython class that keeps a reference to avector[column_view]and a corresponding list ofColumnViewto handle keeping children of children of children of etc. alive.
Number 2 is probably the better long term solution but is a bit of annoying work as of now.
You'd just need to make this vector stay alive for the lifetime of the Python column object.
Instead of creating a column_view on the fly for a Python column every time you need to pass into libcudf, why not just construct and store the column_view once? I think this is what you're suggesting in 2.
You'd just need to make this
vectorstay alive for the lifetime of the Python column object.
What about the column_view children of those column_view? Nothing would be controlling their lifetime since they'd now be column_view*.
Instead of creating a column_view on the fly for a Python column every time you need to pass into libcudf, why not just construct and store the column_view once? I think this is what you're suggesting in #2.
We'd then need to somehow invalidate that column_view whenever we do something potentially mutable, which in the currently state of using legacy things as well is still somewhat all over the place. My initial thought is to basically replace where we call .view() currently with something like view().column_view() where .view() returns a ColumnView and .column_view() returns the C++ column_view, but this can lead to situations where we accidentally create an ephemeral instance of ColumnView and then end up with a dangling pointer in the column_view.
Understood that it will require a bit of work.
It sounds like the pros still outweigh the cons. Eliminating this pervasive compiler error will be a huge win. So unless it's totally impossible to make this change without significant effort, it still seems like something we should do.
And to be clear, this wouldn't be tackled until post-GTC, 0.14 timeframe.
I think this is a nice bit if tech debt to tackle in 0.14.
@jrhemstad can you document here what you ran up against that lead to reprioritizing this?
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This is still relevant. @jrhemstad instead of storing a pointer and size, I guess column_view could just store a host_span<column_view>?
This is still relevant. @jrhemstad instead of storing a pointer and size, I guess
column_viewcould just store ahost_span<column_view>?
It could, yeah. I remember I went down the path of making this change (not the host_span specifically) and running into what felt like a showstopper. I can't remember what it was right now. I'd have to revisit and think about it.