ScientificTypes.jl icon indicating copy to clipboard operation
ScientificTypes.jl copied to clipboard

tuple/table ambiguity

Open ablaom opened this issue 3 years ago • 3 comments

The scitype of a tuple is intended to be the Tuple of the element scitypes. For example:

julia> scitype((1.0, 4))
Tuple{Continuous, Count}

By this logic, if I create a 1-tuple with a table t as it's single element, then this tuple should have Tuple{scitype(t)}. But this isn't always the case:

t = (x=[1, 2], y=["a", "b"])

julia> scitype(t)
Table{Union{AbstractVector{Count}, AbstractVector{Textual}}}

julia> scitype((t,))
Table{Union{AbstractVector{AbstractVector{Count}}, AbstractVector{AbstractVector{Textual}}}}

The problem is that (t, ) is also a table (with one row):

julia> schema((t,))
┌───────┬─────────────────────────┬────────────────┐
│ names │ scitypes                │ types          │
├───────┼─────────────────────────┼────────────────┤
│ x     │ AbstractVector{Count}   │ Vector{Int64}  │
│ y     │ AbstractVector{Textual} │ Vector{String} │
└───────┴─────────────────────────┴────────────────┘

This is pretty awful 😢 . For example it makes it tricky, in MLJBase, to use the fit_data_scitype of models, to check compatibility of a model with data, as in https://github.com/JuliaAI/MLJBase.jl/pull/731 . That is, the test scitype(data) <: fit_data_scitype(model) where data is the tuple of data arguments, is not reliable.

ablaom avatar Jan 26 '22 02:01 ablaom

cc @pazzo83

ablaom avatar Jan 26 '22 02:01 ablaom

Ah so this is why my tests were failing?

pazzo83 avatar Jan 26 '22 04:01 pazzo83

No, I now think that the MLJBase PR is (by accident?) actually avoiding this issue. See https://github.com/JuliaAI/MLJBase.jl/pull/731#issuecomment-1021891466 .

Still this issue could turn up unexpectedly elsewhere.

ablaom avatar Jan 26 '22 06:01 ablaom