Fix Autoxgboost reader issue
Now merging in the correct branch.
This might fix the CI problems.
What should we do here? autoxgboost was withdrawn from the benchmark, should we also officially remove its integration from the master branch? Because if not, then I feel that we need to continue support for people to run it.
I'm trying to run autoxgboost on my data (input is a common CSV) and it fails with:
CalledProcessError: Command 'Rscript --vanilla -e ".libPaths('/bench/frameworks/autoxgboost/lib'); source('/bench/frameworks/autoxgboost/exec.R'); run('/input/test_data/differentiate_cancer_train.csv…
More specifically:
...
Parse with reader=readr : /input/test_data/differentiate_cancer_train.csv
Error in parseHeader(path) :
Invalid column specification line found in ARFF header:
f_1,f_2,f_3,f_4,f_5,f_6,f_7,f_8,f_9,f_10,f_11,f_12,f_13,f_14,f_15,f_16,f_17,f_18,f_19,f_20,f_21,f_22,f_23,f_24,f_25,f_26,f_27,f_28,f_29,f_30,f_31,f_32,f_33,f_34,f_35,f_36,f_37,f_38,f_39,f_40,f_41,f_42,f_43,f_44,f_45,f_46,f_47,f_48,f_49,f_50,...
Searching around and I found out this https://machinelearningmastery.com/load-csv-machine-learning-data-weka/
Yet it's about Weka, but it make me think if my data need to be converted anyway. And now I'm wondering if this PR could help me as well.
BTW, frameworks ranger and mlr3automl failed in the same way.
Could you open a new issue that specifies exactly what versions you are using (OS, Python, AMLB), the command you use to start such an experiment, the custom dataset configuration (yaml file) and, if possible, the dataset itself? It seems to try to read the CSV file as ARFF file, which is problematic since ARFF requires a header.
Closing this PR, let's officially withdraw support for autoxgboost from the benchmark. Only if someone steps up to fix the issues and indicates the intention to maintain the integration we can reconsider.