xarray
xarray copied to clipboard
display the indexes in the string reprs
With the flexible indexes refactor indexes have become much more important, which means we should include them in the reprs of DataArray
and Dataset
objects.
This is a initial attempt, covering only the string reprs, with a few unanswered questions:
- how do we format indexes? Do we delegate to their
__repr__
or some other method? - should we skip
PandasIndex
andPandasMultiIndex
? - how do we present indexes that wrap multiple columns? At the moment, they are duplicated (see also the discussion in #6392)
- what do we do with the index marker in the coords repr?
(also, how do we best test this?)
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in
whats-new.rst
A few thoughts:
how do we format indexes? Do we delegate to their repr or some other method?
Like for variable data, Xarray indexes could implement _repr_inline_
and __repr__
to have both a summarized and detailed representation. They could also have a _repr_html_
(for fancy representation of complex indexes).
should we skip PandasIndex and PandasMultiIndex?
I'd skip it for the plain text DataArray / Dataset reprs (not a good information / verbosity ratio), but I'd keep it for the html repr (the index section could be collapsed by default) as well as for the Indexes
repr. We could also provide a display option for more control on this (e.g., a display_default_indexes
option set to False
by default).
how do we present indexes that wrap multiple columns? At the moment, they are duplicated
Assuming that all coordinates related to a given index are shown next to each other, we could render the inline repr for the 1st coordinate and then use a short symbol (e.g., --
, or a unicode symbol?) below that means "it's the same index".
what do we do with the index marker in the coords repr?
I think we can keep it as-is. It helps to identify at a glance which coordinates are indexed and which aren't. And it's still relevant if we skip PandasIndex and PandasMultiIndex in the plain text DataArray / Dataset reprs.
Shall we point this at the main
branch instead?
From #6867 (https://github.com/pydata/xarray/issues/6867#issuecomment-1202535745), we might want to update the "Dimensions without coordinates" line too.
should we skip PandasIndex and PandasMultiIndex?
If we're encouraging people to look at these / subclass them to create custom indexes, then they should have a repr too.
I think we should move forward here.
Are there any real blockers?
not really, I wanted to wait until set_xindex
was in main (the PR has been merged last week) and have not looked at it since.
Edit: we don't yet have tests, though
There's two issues left that might need a bit of discussion: in 8f21df3 I skipped displaying default PandasIndex
instances because that's basically redundant with the *
on coordinates (and it would have required me to update a lot of doctests, which will probably make this a breaking change). If we decide to revert that, should we mark every coordinate with an index with a *
?
Another proposal we had was to replace the "dimensions without coordinates" line with a "coordinates without index" line. @benbovy, this might be a misunderstanding on my part, but I thought "dimension coordinates" (and in particular their indexes) are still used for alignment? If so, I think we might need both lines.
Looks good to me @keewis. Thanks for your work on the indexes repr!
Yes I think we can skip displaying default indexes for now... The question is which indexes are considered as default, i.e., all PandasIndex
and PandasMultiIndex
instances (like in this PR) or just the single pandas indexes automatically created for the dimension coordinates? We can decide this later, though, it's not a problem adding more indexes in the text repr later (we'll probably need it when dropping the multi-index dimension coordinate with tuple elements). For the html repr it's easier: we could display all indexes and collapse the section by default.
but I thought "dimension coordinates" (and in particular their indexes) are still used for alignment?
Yes that's a good point. Let's keep "dimensions without coordinates".