iris icon indicating copy to clipboard operation
iris copied to clipboard

Improve iris.pandas cube -> data.frame

Open hsteptoe opened this issue 3 years ago • 3 comments

🚀 Pull Request

Description

Improve the conversion of a Cube to a Pandas data.frame. Aims to address #4526

Aims to deal with:

  • [x] An arbitrary number of dimensions -> d8c5b40
  • [x] Missing cube dimensions -> a46c08a
  • [x] Add auxiliary coordinate information to dataframe -> 1df3072
  • [x] Add global attribute data to dataframe -> 2c2ce69
  • [x] Deal with copy issue
  • [ ] Update doc strings
  • [x] Update tests (but not unit & integration testing)
  • [ ] Update Whats New
  • [ ] Check alignment with @trexfeathers #4890

Major proposed (possibly breaking) changes

  • Having functions to convert to a Series and a DataFrame seems superfluous. A Series is arguably a special case of a DataFrame, so I propose having only a function to do the DataFrame conversion, and then let folk further convert to a Series if they want to via standard pandas functions. This is done in 61b801a. -~~I don't think the copy argument is valid any more. Working with 'long' format DataFrames there is no continuity between the cube.data array and the DataFrame in memory. copy=True is the default, but there is not option for copy=False any more. This is done in 071d07f.~~ Copy ability retained.

WORK IN PROGRESS

hsteptoe avatar Mar 29 '22 14:03 hsteptoe

Good discussion with @trexfeathers about our joint work on Pandas-Iris bridging. We agreed that he would focus (#4890) on the DataFrame -> Cube, so this PR will only focus on Cube -> DataFrame

hsteptoe avatar Aug 03 '22 10:08 hsteptoe

Reference @trexfeathers alignment issues https://github.com/SciTools/iris/pull/4890#issuecomment-1217820004

hsteptoe avatar Aug 17 '22 10:08 hsteptoe

@hsteptoe Fancy doing a rebase to resolve these conflicts? :+1:

bjlittle avatar Sep 14 '22 09:09 bjlittle

ci-tests/doctest and ci-test/linkcheck errors aren't especially obvious to me...

hsteptoe avatar Sep 26 '22 12:09 hsteptoe

@trexfeathers Any chance this might make it into v3.4.0 #4433 ?

hsteptoe avatar Oct 24 '22 07:10 hsteptoe

@trexfeathers Any chance this might make it into v3.4.0 #4433 ?

@hsteptoe I'd certainly like for this to happen. Hopefully my review today gives enough warning for you to action the various comments, and therefore not need much more time from the core Iris team (since we are each assigned to other things too!).

trexfeathers avatar Oct 24 '22 14:10 trexfeathers

@trexfeathers Any chance this might make it into v3.4.0 #4433 ?

@hsteptoe I'd certainly like for this to happen. Hopefully my review today gives enough warning for you to action the various comments, and therefore not need much more time from the core Iris team (since we are each assigned to other things too!).

@trexfeathers Thanks for the review! I'll look at actioning it this week...

hsteptoe avatar Oct 24 '22 15:10 hsteptoe

Well I'm perplexed. They all passed for me locally!

trexfeathers avatar Nov 04 '22 17:11 trexfeathers

Well I'm perplexed. They all passed for me locally!

And for me... but it looked like there was some difference in the number of cols being printed in the github tests vs my local tests... now checked the output and matched to github test version...

hsteptoe avatar Nov 04 '22 17:11 hsteptoe

Well I'm perplexed. They all passed for me locally!

And for me... but it looked like there was some difference in the number of cols being printed in the github tests vs my local tests... now checked the output and matched to github test version...

I'm guessing Pandas is sensitive to available horizontal space when running. Rather frustrating as this gives us a situation where local behaves differently to CI. I won't let this get in the way of merging and I'll discuss with the team in future.

trexfeathers avatar Nov 04 '22 17:11 trexfeathers

Well I'm perplexed. They all passed for me locally!

And for me... but it looked like there was some difference in the number of cols being printed in the github tests vs my local tests... now checked the output and matched to github test version...

I'm guessing Pandas is sensitive to available horizontal space when running. Rather frustrating as this gives us a situation where local behaves differently to CI. I won't let this get in the way of merging and I'll discuss with the team in future.

Possibly some pandas options that we could explore: https://stackoverflow.com/a/11711637

hsteptoe avatar Nov 04 '22 17:11 hsteptoe

Super! Thanks @hsteptoe. And we made it in under 100 commits 😂

trexfeathers avatar Nov 04 '22 17:11 trexfeathers

Super! Thanks @hsteptoe. And we made it in under 100 commits 😂

🎉 @trexfeathers Thanks for all your help!

hsteptoe avatar Nov 07 '22 07:11 hsteptoe

Awesome, thanks gents!

kaedonkers avatar Nov 17 '22 13:11 kaedonkers