openml-python icon indicating copy to clipboard operation
openml-python copied to clipboard

Support for xarray?

Open joaquinvanschoren opened this issue 4 years ago • 2 comments

I heard from a growing number of people that it would be good to support xarray: http://xarray.pydata.org/en/stable/

It supports multi-dimensional data (tensors) whereas pandas only supports single tables. Images can for instance be stored more easily.

It seems possible to convert back and forth between xarray and pandas. What I don't know is how extensive this is. Can it 'flatten' tensors and store it as vectors and vice versa? If so, maybe this is an easy extension. If not, this may require updates to the backend as well.

I just wanted to start some discussin about this :)

Thanks!

joaquinvanschoren avatar Sep 30 '20 11:09 joaquinvanschoren

I guess the most important questions right now would be:

  1. how to convert this arff in a useful manner?
  2. what would be a practical use case that cannot be easily tackled right now?
  3. do any consuming ML libraries support it?
  4. who would implement this? Supporting yet another library asks for generalizing the data format a bit more...

mfeurer avatar Sep 30 '20 15:09 mfeurer

For reference, the xarray to/from dataframe methods are documented here. My initial response would be that it provides little to no benefit without as long as our datasets are ultimately flat arff tables with no meta-data to convert them to tensors? Very interested to see some of Matthias' questions answered.

PGijsbers avatar Oct 01 '20 17:10 PGijsbers