anndata
anndata copied to clipboard
Draft for AnnData html repr
This is an implementation for HTML representation of anndata objects. I mostly copied from xarray: https://github.com/pydata/xarray and the other draft pr #694. It addresses the issue #675.
In #694 @ivirshup mentioned some topics. Here is how I addressed them:
Overview
Display Configuration
Currently, there are private util functions to create configuration data objects and a function to create the html_repr from that specific object.
def _create_anndata_repr(ad_obj: "anndata.AnnData"):
sections_conf = _create_anndata_display_conf(ad_obj) # special function call for AnnData
sections = _create_sections_from_conf(ad_obj, sections_conf) # AnnData agnostic function call
return _obj_repr(sections) # Note that sections here are also independent of AnnData
def _create_anndata_display_conf(ad_obj: "anndata.AnnData"):
"""Factory to create the configuration of AnnData repr
the options are hard-coded here (e.g., max_items_collapse options).
"""
max_items_collapse = { # display parameters set special to AnnData
"X": 1,
"obs": 1,
}
...
return section_conf
Currently, I might replace this private util function flow. I am thinking of defining a configuration class and make the dispatches at class level. Which is closer to the recommendation on the original draft #694. It might be an overkill for this task.
Doc View
As far as I know, the documentations created using .ipynb (like this page) should show the AnnData representation by default. But I don't know how to do this on the API documentation. xarray API documentations doesn't have this feature either.
Update: I compiled anndata-tutorials locally and it works. It also seems like it has a different style on docs:
So whenever the docs are created by an .ipynb file, this representation will show up.
Some Notes
- Note that I use the direct HTML tables of pandas in the representation. Which might not be desirable, but it is easy to implement and gives a familiar representation.
- Note that NumPy configurations for representation can be tuned by the global
OPTIONS
variable.
Examples
Examples are here.
Ongoing Issues
- Setting Configurations: xarray has an options class. Should we use the same approach for setting the display configurations? I think this would be another issue outside the repr_html scope. TLDR: There is no set function for display configs.
-
Maintainability:
html_formatting.py
file is only full of private utility functions. Might have to define classes like mentioned in Display Configuration. I would suggest we should start doing this when we are expanding html_repr to MuData. - Testing: I don't know how we will do unit testing of the html_repr. For this reason, I put as many cases as I could in the examples.
TODO
- ~~No pandas.DataFrame display settings. Use the context manager for pandas (pandas.options) to configure.~~
Updates (24.06.22)
Representing backed data
On one of the meetings, we talked about how will the code behave if the data to represent isn't present in the memory. For example, if the data is not in memory, or it is too big xarray gives this rep
:
def short_data_repr(array):
if array._in_memory or array.size < 1e5:
return short_numpy_repr(array)
else:
# internal xarray array type
return f"[{array.size} values with dtype={array.dtype}]"
We decided to use a similar approach to not load the memory. But I realized xarray does this because they have a native array type, and they are responsible for returning the representation. Since we don't have any low level native array class (they are all wrapped versions of numpy/zarr/dask arrays) when we call their repr functions I think it is reasonable to trust them that they won't load data from the disk or do something expensive for a repr call. At least this is what I understand so far. For example, dask just gives this str repr:
Dask HTML repr
Unless specified the dispatches uses the str
repr
. I realized dask has a html_repr by default. This can be added to AnnData repr but it might be too much.
For now, I won't add it to not make the representation too crowded. But if you have any comments, please let me know.
pandas.DataFrame categories added to attributes
I think in the meeting, someone asked if we can see the data types of columns and maybe list their categories. So added these to the attribute section.
Some
Codecov Report
Merging #784 (940a7e6) into master (4103ebd) will decrease coverage by
1.49%
. The diff coverage is41.10%
.
@@ Coverage Diff @@
## master #784 +/- ##
==========================================
- Coverage 83.12% 81.63% -1.50%
==========================================
Files 34 35 +1
Lines 4416 4579 +163
==========================================
+ Hits 3671 3738 +67
- Misses 745 841 +96
Impacted Files | Coverage Δ | |
---|---|---|
anndata/_core/formatting_html.py | 40.62% <40.62%> (ø) |
|
anndata/_core/anndata.py | 83.35% <66.66%> (-0.07%) |
:arrow_down: |