XGBoost.jl
XGBoost.jl copied to clipboard
confusing error messages when passed tables with invalid types
received unusual error message trying to convert a DataFrame to DMatrix. Have in the past and currently convert other DataFrame object without issue. Not sure what is different with this object. Any clues via the error message or suggestion how to troubleshoot would be helpful.
Here is the error
julia> typeof(s)
DataFrame
julia> DMatrix(s)
ERROR: ArgumentError: DMatrix requires either an AbstractMatrix or table satisfying the Tables.jl interface
Stacktrace:
[1] DMatrix(tbl::Matrix{Any}; feature_names::Vector{String}, kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ XGBoost ~/.julia/packages/XGBoost/Fyff4/src/dmatrix.jl:249
[2] DMatrix(tbl::DataFrame; feature_names::Vector{String}, kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ XGBoost ~/.julia/packages/XGBoost/Fyff4/src/dmatrix.jl:251
[3] DMatrix(tbl::DataFrame)
@ XGBoost ~/.julia/packages/XGBoost/Fyff4/src/dmatrix.jl:244
[4] top-level scope
@ REPL[63]:1
additional introspection on DataFrame object:
julia> Tables.istable(s)
true
julia> Tables.columnnames(s)
35-element Vector{Symbol}:
This is expected behavior but a bad error message. The conversion to a matrix is resulting in something with eltype
Any
where it's expecting Real
.
We probably should try to standardize the cases in which the Any
elements get converted. It's certainly reasonable for it to fail in some cases, but it wouldn't surprise me if currently it fails in some cases that are not so reasonable.
Could you list all types present in your input tables? I think in your case it would be
Set(Iterators.map(typeof, Tables.matrix(s)))
On second thought, something else fishy is happening here. This specific error should only happen when !Tables.istable(s)
.
Also could you please try this on the latest version of XGBoost.jl? From your stack trace it looks like this is at least a few commits old.
Thank you for prompt response! I believe you hit the problem. A few of the columns in DataFrame object contain String type. I need to figure out the source of this issue on my end (object is end result of multiple conversions of string to numbers and these columns seems to have been missed).
Will recontact if this does not correct issue.
A few of the columns in DataFrame object contain String type
In that case this was definitely supposed to throw an error but this error message was pretty terrible and confused even me. So this is an actionable issue in that we need an improved error message here. I'd be happy if it hit a MethodError
, but this is just downright confusing.
The string element was issue here. Leave it to your discretion if any changes in error message are needed.
Thanks for helping me sort out the issue.
Yes, in my opinion this message is sufficiently confusing that it warrants an open issue, so let's keep this open, though it's not really a high priority to fix it.