tuple/table ambiguity
The scitype of a tuple is intended to be the Tuple of the element scitypes. For example:
julia> scitype((1.0, 4))
Tuple{Continuous, Count}
By this logic, if I create a 1-tuple with a table t as it's single element, then this tuple should have Tuple{scitype(t)}. But this isn't always the case:
t = (x=[1, 2], y=["a", "b"])
julia> scitype(t)
Table{Union{AbstractVector{Count}, AbstractVector{Textual}}}
julia> scitype((t,))
Table{Union{AbstractVector{AbstractVector{Count}}, AbstractVector{AbstractVector{Textual}}}}
The problem is that (t, ) is also a table (with one row):
julia> schema((t,))
┌───────┬─────────────────────────┬────────────────┐
│ names │ scitypes │ types │
├───────┼─────────────────────────┼────────────────┤
│ x │ AbstractVector{Count} │ Vector{Int64} │
│ y │ AbstractVector{Textual} │ Vector{String} │
└───────┴─────────────────────────┴────────────────┘
This is pretty awful 😢 . For example it makes it tricky, in MLJBase, to use the fit_data_scitype of models, to check compatibility of a model with data, as in https://github.com/JuliaAI/MLJBase.jl/pull/731 . That is, the test scitype(data) <: fit_data_scitype(model) where data is the tuple of data arguments, is not reliable.
cc @pazzo83
Ah so this is why my tests were failing?
No, I now think that the MLJBase PR is (by accident?) actually avoiding this issue. See https://github.com/JuliaAI/MLJBase.jl/pull/731#issuecomment-1021891466 .
Still this issue could turn up unexpectedly elsewhere.