Draft for AnnData html repr

Open selmanozleyen opened this issue 2 years ago • 1 comments

This is an implementation for HTML representation of anndata objects. I mostly copied from xarray: https://github.com/pydata/xarray and the other draft pr #694. It addresses the issue #675.

In #694 @ivirshup mentioned some topics. Here is how I addressed them:

Overview

Display Configuration

Currently, there are private util functions to create configuration data objects and a function to create the html_repr from that specific object.

def _create_anndata_repr(ad_obj: "anndata.AnnData"):
    sections_conf = _create_anndata_display_conf(ad_obj)            # special function call for AnnData
    sections = _create_sections_from_conf(ad_obj, sections_conf) # AnnData agnostic function call
    return _obj_repr(sections) # Note that sections here are also independent of AnnData

def _create_anndata_display_conf(ad_obj: "anndata.AnnData"):
    """Factory to create the configuration of AnnData repr
    the options are hard-coded here (e.g., max_items_collapse options).
    """
    max_items_collapse = { # display parameters set special to AnnData
        "X": 1,
        "obs": 1,
    }
...
return section_conf

Currently, I might replace this private util function flow. I am thinking of defining a configuration class and make the dispatches at class level. Which is closer to the recommendation on the original draft #694. It might be an overkill for this task.

Doc View

As far as I know, the documentations created using .ipynb (like this page) should show the AnnData representation by default. But I don't know how to do this on the API documentation. xarray API documentations doesn't have this feature either. Update: I compiled anndata-tutorials locally and it works. It also seems like it has a different style on docs: So whenever the docs are created by an .ipynb file, this representation will show up.

Some Notes

Note that I use the direct HTML tables of pandas in the representation. Which might not be desirable, but it is easy to implement and gives a familiar representation.
Note that NumPy configurations for representation can be tuned by the global OPTIONS variable.

Examples

Examples are here.

Ongoing Issues

Setting Configurations: xarray has an options class. Should we use the same approach for setting the display configurations? I think this would be another issue outside the repr_html scope. TLDR: There is no set function for display configs.
Maintainability: html_formatting.py file is only full of private utility functions. Might have to define classes like mentioned in Display Configuration. I would suggest we should start doing this when we are expanding html_repr to MuData.
Testing: I don't know how we will do unit testing of the html_repr. For this reason, I put as many cases as I could in the examples.

TODO

~~No pandas.DataFrame display settings. Use the context manager for pandas (pandas.options) to configure.~~

Updates (24.06.22)

Representing backed data

On one of the meetings, we talked about how will the code behave if the data to represent isn't present in the memory. For example, if the data is not in memory, or it is too big xarray gives this rep:

 def short_data_repr(array):
     if array._in_memory or array.size < 1e5:
         return short_numpy_repr(array)
     else:
         # internal xarray array type
         return f"[{array.size} values with dtype={array.dtype}]"

We decided to use a similar approach to not load the memory. But I realized xarray does this because they have a native array type, and they are responsible for returning the representation. Since we don't have any low level native array class (they are all wrapped versions of numpy/zarr/dask arrays) when we call their repr functions I think it is reasonable to trust them that they won't load data from the disk or do something expensive for a repr call. At least this is what I understand so far. For example, dask just gives this str repr:

Dask HTML repr

Unless specified the dispatches uses the str repr. I realized dask has a html_repr by default. This can be added to AnnData repr but it might be too much. For now, I won't add it to not make the representation too crowded. But if you have any comments, please let me know.

pandas.DataFrame categories added to attributes

I think in the meeting, someone asked if we can see the data types of columns and maybe list their categories. So added these to the attribute section.

Some

Jun 13 '22 15:06 selmanozleyen

Codecov Report

Merging #784 (940a7e6) into master (4103ebd) will decrease coverage by 1.49%. The diff coverage is 41.10%.

@@            Coverage Diff             @@
##           master     #784      +/-   ##
==========================================
- Coverage   83.12%   81.63%   -1.50%     
==========================================
  Files          34       35       +1     
  Lines        4416     4579     +163     
==========================================
+ Hits         3671     3738      +67     
- Misses        745      841      +96

Impacted Files	Coverage Δ
anndata/_core/formatting_html.py	`40.62% <40.62%> (ø)`
anndata/_core/anndata.py	`83.35% <66.66%> (-0.07%)`	:arrow_down:

Jun 13 '22 15:06 codecov[bot]

anndata anndata copied to clipboard

Draft for AnnData html repr

Overview

Display Configuration

Doc View

Some Notes

Examples

Ongoing Issues

TODO

Updates (24.06.22)

Representing backed data

Dask HTML repr

pandas.DataFrame categories added to attributes

Some

Codecov Report

anndata
anndata copied to clipboard