machinelearning DataFrame - add support for vbuffer

It seems that dataframe API still doesn't support vbuffer, in which case if there's vbuffer type in IDataView, ToDataFrame() will fail.

Jul 07 '21 19:07 LittleLittleCloud

To give an example. For the following dataset


5.1	3.5	1.4	0.2	setosa
4.9	3.0	1.4	0.2	setosa
4.7	3.2	1.3	0.2	setosa

Using the following IDataView schema

type ModelInput = {
    [<LoadColumn(0,3)>] Features: float32 array
    [<LoadColumn(4)>] Label: string
}

Throws the following error when ToDataFrame is called.

System.NotSupportedException: VBuffer`1 is not a supported column type.

Aug 11 '21 23:08 lqdev

@ericstj @eerhardt @michaelgsharp What it takes to add support for VBuffer or more generically: Object in DataFrame API, any roadmap for that

Feb 28 '22 22:02 LittleLittleCloud

This is going to need some further investigation to see what it would take. Its in our roadmap but we will be taking a look at it after we get TorchSharp resolved.

Mar 02 '22 00:03 michaelgsharp

@eerhardt have you thought of this before or are you aware of any discussion with Prasanth about it? If we have an idea of how it would work we could write that up here in case someone else might be interested in helping fix this.

Mar 02 '22 01:03 ericstj

I really haven't given it deep thought. I know it is a problem, but I'm not sure how exactly to structure a DataFrameColumn that contains VBuffer instances. They are a little bit at odds, since VBuffer is supposed to be a "buffer" that changes as you "cursor" over the rows of an IDataView. Whereas DataFrame wants everything to be loaded at once in memory. But maybe we can have a column derived from DataFrameColumn that contains a distinct VBuffer for every row in the DataFrame.

That's about as far as I've gone thinking of this.

See also https://github.com/dotnet/machinelearning/issues/5721

Mar 02 '22 16:03 eerhardt

@michaelgsharp or @JakeRadMSFT -- Was this completed with #6409, or is there more to do here still?

Nov 27 '23 17:11 jeffhandley