automlbenchmark
automlbenchmark copied to clipboard
Scipy sparse matrices not handled correctly by TPOT and autosklearn
Failing datasets: https://openml.org/t/360932 https://openml.org/t/360932
- serialization of sparse matrices was not applied correctly.
- once fixed, the frameworks still fail with the following errors:
# TPOT
File "/Users/seb/repos/ml/automlbenchmark/frameworks/TPOT/venv/lib/python3.7/site-packages/tpot/base.py", line 1359, in _check_dataset
self.config_dict
ValueError: Not all operators in None supports sparse matrix. Please use "TPOT sparse" for sparse matrix.
#autosklearn
File "/Users/seb/repos/ml/automlbenchmark/frameworks/autosklearn/venv/lib/python3.7/site-packages/sklearn/utils/multiclass.py", line 288, in type_of_target
if y.ndim > 2 or (y.dtype == object and len(y) and
TypeError: len() of unsized object
We'll improve support for sparse data in a future version: for now, we can simply deserialize the sparse matrices as dense matrices for the frameworks that don't use pandas.
Just checking - are these sparse target matrices y
? We might indeed not have tests for that.
CC @eddiebergman
@mfeurer in this case both X
and y
are indeed sparse, not sure this makes sense for y
.
I currently fixed this by turning both into arrays as I thought the problem was X
, but it's very possible that for some frameworks, it's only necessary to do this for y
.
Thanks for the clarification. Auto-sklearn should support sparse X
, but we'll check, and will also check what the behavior for sparse y
values is.
@mfeurer for autosklearn, sparse X
with dense y
seems to work fine (and faster), meaning that in your case, sparse y
was the issue.
Thanks for noticing this: ideally we'd like to have frameworks using sparse data whenever possible, so I'll probably just make the y
s dense by default, and see individually for each framework regarding X
.
cc: @PGijsbers
@sebhrusen It's probably in the interest of autosklearn
to handle sparse y
correctly in this case, I'll have a look into it
@eddiebergman Sure, just mentioning that we have a workaround on our side for now that also seems to work for other frameworks. Thanks for fixing it on your side too.
Hi @sebhrusen,
Just letting you know the fix should be in the next release and I tracked down the problem a little more and wrote a brief synopsis, incase it helps identify the problem for other libraries.