liac-arff
liac-arff copied to clipboard
Issue reading timestamp attributes
I forward a limitation pointed out by @zuliani99 in scikit-learn: https://github.com/scikit-learn/scikit-learn/issues/19944 It is more appropriate to solve the issue upstream than in the vendor version in scikit-learn.
Describe the bug
I am trying to fetch a dataset with the fetch_openml api and I notice that it can't handle date type features like timestamp.
Steps/Code to Reproduce
id = 41889
X, y = fetch_openml(data_id=id, as_frame=True, return_X_y=True, cache=False)
y = y.to_frame()
X[y.columns[0]] = y
df = X
Expected Results
I expected it returns the usual X and y.
Actual Results
Traceback (most recent call last):
File "start.py", line 29, in <module>
main()
File "start.py", line 25, in main
test()
File "/home/riccardo/Desktop/AutoML-Benchmark/functions/test.py", line 10, in test
X, y = fetch_openml(data_id=id, as_frame=True, return_X_y=True, cache=False)
File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/datasets/_openml.py", line 915, in fetch_openml
bunch = _download_data_to_bunch(url, return_sparse, data_home,
File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/datasets/_openml.py", line 633, in _download_data_to_bunch
out = _retry_with_clean_cache(url, data_home)(
File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/datasets/_openml.py", line 59, in wrapper
return f(*args, **kw)
File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/datasets/_openml.py", line 514, in _load_arff_response
arff = _arff.load(stream,
File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/externals/_arff.py", line 1078, in load
return decoder.decode(fp, encode_nominal=encode_nominal,
File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/externals/_arff.py", line 915, in decode
raise e
File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/externals/_arff.py", line 911, in decode
return self._decode(s, encode_nominal=encode_nominal,
File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/externals/_arff.py", line 842, in _decode
attr = self._decode_attribute(row)
File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/externals/_arff.py", line 784, in _decode_attribute
raise BadAttributeType()
sklearn.externals._arff.BadAttributeType: Bad @ATTRIBUTE type, at line 2.
Versions
System: python: 3.8.5 (default, Jan 27 2021, 15:41:15) [GCC 9.3.0] executable: /usr/bin/python3 machine: Linux-5.8.0-50-generic-x86_64-with-glibc2.29
Python dependencies: pip: 21.0.1 setuptools: 56.0.0 sklearn: 0.24.1 numpy: 1.19.5 scipy: 1.5.4 Cython: 0.29.22 pandas: 1.1.4 matplotlib: 3.4.1 joblib: 1.0.1 threadpoolctl: 2.1.0
Built with OpenMP: True
See also #44
I also find out that sometimes after fetch a database from OpenML the y variable so the target is a NoneType variable, so how can I figure out what's the target value? Maybe the y is always the last column of the dataset?
I also find out that sometimes after fetch a database from OpenML the y variable so the target is a NoneType variable, so how can I figure out what's the target value? Maybe the y is always the last column of the dataset?
Please open an issue with the library you're using to download the data, liac-arff does not know about OpenML.