iris icon indicating copy to clipboard operation
iris copied to clipboard

Improvements to Pandas-to-Iris bridge

Open trexfeathers opened this issue 3 years ago • 0 comments

🚀 Pull Request

Description

Full overhaul of iris.pandas.as_cube(). Closes #3324.

  • More Pandas-esque handling of DataFrames, with each column assumed to represent a specific variable (i.e. a single Series). (Current model converts a DataFrame into a 2-dimensional Cube, treating the columns as the second dimension).
  • Can specify columns that correspond to any type of _DimensionalMetadata (DimCoord, CellMeasure etcetera).
  • Can create n-dimensional Cubes - using the specified dimension coordinates to determine the dimensions. If no dimension coordinates: falls back on the existing Pandas index, creating a 1D Cube.
  • Can create multiple Cubes - one from each column not specified as dimensional metadata. All Cubes share the same dimensional metadata that is generated from the specified columns.

This is a breaking improvement, but is targeting Iris v3.3 which cannot include breaking changes since it is a minor release. I am therefore proposing using iris.FUTURE to allow users to opt in.

To do:

  • [ ] Tests
  • [ ] Alignment with @hsteptoe's work in #4669
  • [x] calendar handling like the original implementation does
  • [ ] Docstring updates Including example of how to reshape a DataFrame to make all data 1-dimensional
  • [ ] What's New entry
  • [x] Series/DataFrame agnosticism
  • [x] Memory efficiency
    • Are we getting NumPy views, or copying the arrays?
    • How does this handle Dask DataFrames?

Consult Iris pull request check list

trexfeathers avatar Aug 04 '22 10:08 trexfeathers