MLJBase.jl icon indicating copy to clipboard operation
MLJBase.jl copied to clipboard

Possibility of TableOperations.jl for column access

Open OkonSamuel opened this issue 2 years ago • 3 comments

We should be able to apply methods from TableOperations.jl to reimplement the selectrows and selectcols method for generic tables. This should help us avoid the issue we are currently experiencing with materializing NamedTuples

OkonSamuel avatar Jun 14 '22 23:06 OkonSamuel

For selecting rows, I'm hoping https://github.com/JuliaData/Tables.jl/pull/278 will be more helpful, as it allows individual Tables formats to expose more efficient selection methods to the generic Tables.jl API, which TableOperations.jl cannot do. And one idea, is to subsume table-row access under the more generic getobs API at MLUtils.jl.

But happy to go with your suggestion for TableOperations.jl for column access.

ablaom avatar Jun 16 '22 00:06 ablaom

Worth noting here that, as far as I know, selectcols is not used within MLJBase at all, only selectrows. However, I believe it is exposed in MLJModelInterface, and the transformers in MLJModels do use it. I'd be inclined to removing this from MLJModelInterface in the future. A model provider really wanting generic table column accesss could just import Tables.jl or TableOperations.jl, I guess.

ablaom avatar Jun 16 '22 00:06 ablaom