DataFrame - add support for vbuffer
It seems that dataframe API still doesn't support vbuffer, in which case if there's vbuffer type in IDataView, ToDataFrame() will fail.
To give an example. For the following dataset
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
Using the following IDataView schema
type ModelInput = {
[<LoadColumn(0,3)>] Features: float32 array
[<LoadColumn(4)>] Label: string
}
Throws the following error when ToDataFrame is called.
System.NotSupportedException: VBuffer`1 is not a supported column type.
@ericstj @eerhardt @michaelgsharp
What it takes to add support for VBuffer or more generically: Object in DataFrame API, any roadmap for that
This is going to need some further investigation to see what it would take. Its in our roadmap but we will be taking a look at it after we get TorchSharp resolved.
@eerhardt have you thought of this before or are you aware of any discussion with Prasanth about it? If we have an idea of how it would work we could write that up here in case someone else might be interested in helping fix this.
I really haven't given it deep thought. I know it is a problem, but I'm not sure how exactly to structure a DataFrameColumn that contains VBuffer instances. They are a little bit at odds, since VBuffer is supposed to be a "buffer" that changes as you "cursor" over the rows of an IDataView. Whereas DataFrame wants everything to be loaded at once in memory. But maybe we can have a column derived from DataFrameColumn that contains a distinct VBuffer for every row in the DataFrame.
That's about as far as I've gone thinking of this.
See also https://github.com/dotnet/machinelearning/issues/5721
@michaelgsharp or @JakeRadMSFT -- Was this completed with #6409, or is there more to do here still?