openml-python
openml-python copied to clipboard
Support for xarray?
I heard from a growing number of people that it would be good to support xarray: http://xarray.pydata.org/en/stable/
It supports multi-dimensional data (tensors) whereas pandas only supports single tables. Images can for instance be stored more easily.
It seems possible to convert back and forth between xarray and pandas. What I don't know is how extensive this is. Can it 'flatten' tensors and store it as vectors and vice versa? If so, maybe this is an easy extension. If not, this may require updates to the backend as well.
I just wanted to start some discussin about this :)
Thanks!
I guess the most important questions right now would be:
- how to convert this arff in a useful manner?
- what would be a practical use case that cannot be easily tackled right now?
- do any consuming ML libraries support it?
- who would implement this? Supporting yet another library asks for generalizing the data format a bit more...
For reference, the xarray
to/from dataframe
methods are documented here. My initial response would be that it provides little to no benefit without as long as our datasets are ultimately flat arff tables with no meta-data to convert them to tensors? Very interested to see some of Matthias' questions answered.