xarray icon indicating copy to clipboard operation
xarray copied to clipboard

display the indexes in the string reprs

Open keewis opened this issue 1 year ago • 3 comments

With the flexible indexes refactor indexes have become much more important, which means we should include them in the reprs of DataArray and Dataset objects.

This is a initial attempt, covering only the string reprs, with a few unanswered questions:

  • how do we format indexes? Do we delegate to their __repr__ or some other method?
  • should we skip PandasIndex and PandasMultiIndex?
  • how do we present indexes that wrap multiple columns? At the moment, they are duplicated (see also the discussion in #6392)
  • what do we do with the index marker in the coords repr?

(also, how do we best test this?)

  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst

keewis avatar Jul 16 '22 19:07 keewis

A few thoughts:

how do we format indexes? Do we delegate to their repr or some other method?

Like for variable data, Xarray indexes could implement _repr_inline_ and __repr__ to have both a summarized and detailed representation. They could also have a _repr_html_ (for fancy representation of complex indexes).

should we skip PandasIndex and PandasMultiIndex?

I'd skip it for the plain text DataArray / Dataset reprs (not a good information / verbosity ratio), but I'd keep it for the html repr (the index section could be collapsed by default) as well as for the Indexes repr. We could also provide a display option for more control on this (e.g., a display_default_indexes option set to False by default).

how do we present indexes that wrap multiple columns? At the moment, they are duplicated

Assuming that all coordinates related to a given index are shown next to each other, we could render the inline repr for the 1st coordinate and then use a short symbol (e.g., --, or a unicode symbol?) below that means "it's the same index".

what do we do with the index marker in the coords repr?

I think we can keep it as-is. It helps to identify at a glance which coordinates are indexed and which aren't. And it's still relevant if we skip PandasIndex and PandasMultiIndex in the plain text DataArray / Dataset reprs.

benbovy avatar Jul 17 '22 00:07 benbovy

Shall we point this at the main branch instead?

dcherian avatar Jul 27 '22 16:07 dcherian

From #6867 (https://github.com/pydata/xarray/issues/6867#issuecomment-1202535745), we might want to update the "Dimensions without coordinates" line too.

benbovy avatar Aug 02 '22 13:08 benbovy

should we skip PandasIndex and PandasMultiIndex?

If we're encouraging people to look at these / subclass them to create custom indexes, then they should have a repr too.

TomNicholas avatar Sep 15 '22 19:09 TomNicholas

I think we should move forward here.

Are there any real blockers?

dcherian avatar Oct 03 '22 20:10 dcherian

not really, I wanted to wait until set_xindex was in main (the PR has been merged last week) and have not looked at it since.

Edit: we don't yet have tests, though

There's two issues left that might need a bit of discussion: in 8f21df3 I skipped displaying default PandasIndex instances because that's basically redundant with the * on coordinates (and it would have required me to update a lot of doctests, which will probably make this a breaking change). If we decide to revert that, should we mark every coordinate with an index with a *?

Another proposal we had was to replace the "dimensions without coordinates" line with a "coordinates without index" line. @benbovy, this might be a misunderstanding on my part, but I thought "dimension coordinates" (and in particular their indexes) are still used for alignment? If so, I think we might need both lines.

keewis avatar Oct 04 '22 09:10 keewis

Looks good to me @keewis. Thanks for your work on the indexes repr!

Yes I think we can skip displaying default indexes for now... The question is which indexes are considered as default, i.e., all PandasIndex and PandasMultiIndex instances (like in this PR) or just the single pandas indexes automatically created for the dimension coordinates? We can decide this later, though, it's not a problem adding more indexes in the text repr later (we'll probably need it when dropping the multi-index dimension coordinate with tuple elements). For the html repr it's easier: we could display all indexes and collapse the section by default.

but I thought "dimension coordinates" (and in particular their indexes) are still used for alignment?

Yes that's a good point. Let's keep "dimensions without coordinates".

benbovy avatar Oct 12 '22 16:10 benbovy