anndata icon indicating copy to clipboard operation
anndata copied to clipboard

Add HTML representation

Open katosh opened this issue 4 weeks ago • 15 comments

Rich HTML representation for AnnData

  • [x] Closes #675
  • [x] Tests added
  • [x] Release note added (or unnecessary)

Summary

Implements rich HTML representation (_repr_html_) for AnnData objects in Jupyter notebooks and other HTML-aware environments. This builds on previous draft PRs (#784, #694, #521, #346) with a complete, production-ready implementation.

Screenshot

screenshot2

Features

Interactive Display

  • Foldable sections: Each attribute (obs, var, uns, etc.) is collapsible with auto-fold for sections exceeding threshold
  • Search/filter: Real-time filtering across all fields by name, type, or content
  • Copy-to-clipboard: One-click copy for field names
  • Nested AnnData: Expandable nested objects with configurable depth limit

Visual Indicators

  • Category colors: Displays category values with their color palette from uns (e.g., cell_type_colors)
  • Type badges: Visual indicators for views, backed mode, sparse matrices, Dask arrays
  • Serialization warnings: Highlights fields that won't serialize to H5AD/Zarr
  • Value previews: Inline previews for simple uns values (strings, numbers, small dicts/lists)
  • README support: If uns["README"] contains a string, a small ⓘ icon appears in the header; clicking it opens a modal with the content rendered as markdown (headers, bold/italic, code blocks, ordered/ unordered lists, links, blockquotes, tables)
  • Memory info: Shows estimated memory usage in footer

Compatibility

  • Dark mode: Auto-detects Jupyter Lab/VS Code themes
  • No-JS fallback: Graceful degradation when JavaScript is disabled
  • JupyterLab safe: CSS scoped to .anndata-repr prevents style conflicts

Configuration

import anndata

anndata.settings.repr_html_enabled = True            # Enable/disable HTML repr (default: True)
anndata.settings.repr_html_fold_threshold = 5        # Auto-fold sections with more items (default: 5)
anndata.settings.repr_html_max_depth = 3             # Nested AnnData depth limit (default: 3)
anndata.settings.repr_html_max_items = 200           # Max items per section (default: 200)
anndata.settings.repr_html_max_categories = 100      # Max categories inline (default: 100)
anndata.settings.repr_html_dataframe_expand = False  # Expandable DataFrame tables (default: False)
anndata.settings.repr_html_max_field_width = 400     # Max width for field name column (default: 400px)
anndata.settings.repr_html_type_width = 220          # Width for type column (default: 220px)

Extensibility

The repr system provides two extension points: TypeFormatter for custom value visualization and SectionFormatter for adding new sections.

TypeFormatter - Format by Python type (e.g., custom array in obsm/varm):

from anndata._repr import register_formatter, TypeFormatter, FormattedOutput

@register_formatter
class MyArrayFormatter(TypeFormatter):
    sections = ("obsm", "varm")  # Only apply to these sections (None = all)

    def can_format(self, obj):
        return isinstance(obj, MyArrayType)

    def format(self, obj, context):
        return FormattedOutput(
            type_name=f"MyArray {obj.shape}",
            html_content=obj._repr_html_(),  # Custom visualization
            is_expandable=True,
        )

TypeFormatter - Format by embedded type hint (e.g., tagged data in uns):

from anndata._repr import register_formatter, TypeFormatter, FormattedOutput
from anndata._repr import extract_uns_type_hint

@register_formatter
class MyConfigFormatter(TypeFormatter):
    priority = 100  # Check before fallback
    sections = ("uns",)  # Only apply to uns section

    def can_format(self, obj):
        hint, _ = extract_uns_type_hint(obj)
        return hint == "mypackage.config"

    def format(self, obj, context):
        hint, data = extract_uns_type_hint(obj)
        return FormattedOutput(
            type_name="config",
            html_content='<span>Custom preview</span>',
        )

Data with type hints ({"__anndata_repr__": "mypackage.config", ...}) will use the registered formatter when the package is imported. Security: data never triggers code execution; packages must be explicitly imported first.

SectionFormatter - Add new sections (e.g., TreeData's obst/vart):

from anndata._repr import register_formatter, SectionFormatter, FormattedEntry, FormattedOutput

@register_formatter
class ObstSectionFormatter(SectionFormatter):
    section_name = "obst"
    after_section = "obsm"  # Position after obsm

    def should_show(self, obj):
        return hasattr(obj, "obst") and len(obj.obst) > 0

    def get_entries(self, obj, context):
        return [
            FormattedEntry(
                key=key,
                output=FormattedOutput(
                    type_name=f"Tree ({tree.n_nodes} nodes)",
                    html_content=tree._repr_svg_(),  # Custom tree SVG
                    is_expandable=True,
                )
            )
            for key, tree in obj.obst.items()
        ]

Testing

  • 251 unit tests with 90-100% coverage on _repr module
  • 18 visual test cases in python tests/visual_inspect_repr_html.py

Implementation

New module (src/anndata/_repr/):

  • __init__.py - Public API and documentation
  • html.py - Main HTML generator
  • css.py - Scoped CSS with dark mode support
  • javascript.py - Interactive features (fold, search, copy)
  • markdown.py - Minimal JS markdown parser for README rendering (easily replaceable by external module)
  • formatters.py - Type-specific formatters
  • registry.py - Extensible TypeFormatter and SectionFormatter registration
  • utils.py - Helper functions
  • constants.py - Default values for settings (single source of truth)

Modified files:

  • src/anndata/_core/anndata.py - Added _repr_html_() method
  • src/anndata/_settings.py - Added repr_html_* settings
  • src/anndata/_settings.pyi - Added type stubs for repr_html_* settings
  • pyproject.toml - Added vart to codespell ignore list, updated pytest version

Documentation

Documentation for extending the HTML repr is in the module docstrings:

  • src/anndata/_repr/__init__.py - Main documentation with complete examples for:
    • TypeFormatter - Custom visualization for specific types (by Python type or embedded type hints)
    • SectionFormatter - Adding new sections (like TreeData's obst/vart)
    • FormattedOutput and FormattedEntry dataclasses
    • extract_uns_type_hint() for tagged data patterns
  • src/anndata/_repr/registry.py - Detailed API documentation for the registry system
  • src/anndata/_core/anndata.py - _repr_html_() method docstring with settings reference

Demo

Live interactive demo | Gist source

To regenerate locally: python tests/visual_inspect_repr_html.py

Config Changes

pyproject.toml: Added vart to codespell's ignore-words-list. This is needed because vart (variable tree annotations) is a valid section name in the TreeData extension package, analogous to obst (observation tree annotations). The documentation and examples reference this term.

Dependency Changes

pyproject.toml: Updated pytest>=8.2 to pytest>=9.0 in test-min to resolve dependency conflict with hatch-test's built-in pytest~=9.0 requirement.

Related

  • Supersedes #784, #694, #521, #346 (previous drafts)
  • Compatible with #1927 (sparse scipy changes)
  • Fully backward compatible

Acknowledgments

Thanks to @selmanozleyen (#784), @gtca (#694), @VolkerH (#521), @ivirshup (#346, #675), and @Zethson (#675) for the prior work and discussions that informed this implementation.

katosh avatar Nov 29 '25 20:11 katosh

Hi all!

I believe the failing CI checks are due to pre-existing anndata CI configuration issues rather than this PR:

  • Test failures (hatch-test.min, hatch-test.stable, hatch-test.pre): All fail during dependency installation with a pytest version conflict (pytest>=9.0,<10.dev0 and pytest==8.2 unsatisfiable).
  • Triage checks: Needs a milestone assigned by a maintainer.
  • ReadTheDocs: I built the docs locally with sphinx-build -W (warnings as errors) and the 23 warnings that cause failure are all pre-existing issues - broken links to zarr docs and missing tutorial notebooks. None are related to the _repr module. The new code doesn't introduce any new warnings.

Please let me know if I'm wrong about any of this and there are changes I should make to the PR. Happy to fix anything needed!

katosh avatar Nov 29 '25 21:11 katosh

Codecov Report

:x: Patch coverage is 94.54685% with 71 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 86.21%. Comparing base (4f64868) to head (7c7058b). :white_check_mark: All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/anndata/_repr/html.py 94.45% 36 Missing :warning:
src/anndata/_repr/formatters.py 95.05% 14 Missing :warning:
src/anndata/_repr/utils.py 93.54% 10 Missing :warning:
src/anndata/_repr/registry.py 94.57% 9 Missing :warning:
src/anndata/_core/anndata.py 75.00% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2236      +/-   ##
==========================================
+ Coverage   84.47%   86.21%   +1.74%     
==========================================
  Files          46       55       +9     
  Lines        7202     8504    +1302     
==========================================
+ Hits         6084     7332    +1248     
- Misses       1118     1172      +54     
Files with missing lines Coverage Δ
src/anndata/_repr/__init__.py 100.00% <100.00%> (ø)
src/anndata/_repr/constants.py 100.00% <100.00%> (ø)
src/anndata/_repr/css.py 100.00% <100.00%> (ø)
src/anndata/_repr/javascript.py 100.00% <100.00%> (ø)
src/anndata/_repr/markdown.py 100.00% <100.00%> (ø)
src/anndata/_settings.py 92.34% <100.00%> (+2.75%) :arrow_up:
src/anndata/_core/anndata.py 83.08% <75.00%> (-0.09%) :arrow_down:
src/anndata/_repr/registry.py 94.57% <94.57%> (ø)
src/anndata/_repr/utils.py 93.54% <93.54%> (ø)
src/anndata/_repr/formatters.py 95.05% <95.05%> (ø)
... and 1 more

... and 4 files with indirect coverage changes

codecov[bot] avatar Nov 30 '25 03:11 codecov[bot]

Bumped pytest>=8.2 to pytest>=9.0 in test-min to resolve dependency conflict with hatch-test's pinned pytest~=9.0.

Attempting to trigger codecov to refresh its report with actual coverage: @codecov refresh

katosh avatar Nov 30 '25 05:11 katosh

Thank you so much for this massive body of work!

I saw your post on discourse and just wanted to inform you real quick that both of our main anndata maintainers are currently on holidays. But I'm sure that they'll get back to you soon.

Thanks!

Zethson avatar Dec 01 '25 23:12 Zethson

Oh, thanks so much for the update @Zethson! Totally understand, and no rush at all. This one is a bit larger, and I figured it might take some time. I just wanted to signal that I’m very open to discussing any aspect of it (functionality, features, aesthetics), and I’m happy to iterate and incorporate any ideas. I’ll keep working on it until it’s in a shape everyone feels good about.

Thanks again!

katosh avatar Dec 01 '25 23:12 katosh

I would be happy if we could come up with a canonical design and components that we could reuse for both MuData and SpatialData for an ideally consistent experience.

This might be beyond the scope of this PR but since you were asking for feedback I thought I'd mention it nonetheless.

Zethson avatar Dec 02 '25 10:12 Zethson

Thank you for the feedback, I really appreciate it!

Extensibility via Formatters

To accommodate new modalities like spatial data, I implemented two extension points:

  • SectionFormatter - Define visualization of new sections (beyond .obs, .obsm, etc.)
  • TypeFormatter - Customize the appearance of specific data types within any existing section

These formatters are registered when a package is imported. The live demo illustrates this for the treedata extension, which introduces .obst and .vart sections for tree annotations.

MuData Support

I've been thinking about outer wrappers like MuData. While packages that use AnnData internally would generally need to build their own HTML repr, MuData is a special case: it mimics AnnData's structure with shared sections (.obs, .var, .obsm, .varm, .uns) plus an additional .mod section containing the modality AnnData objects.

Because of this structural similarity, MuData can reuse the existing implementation directly by registering a SectionFormatter for .mod:

# In mudata package
from anndata._repr import (
    FormattedEntry,
    FormattedOutput,
    FormatterContext,
    SectionFormatter,
    register_formatter,
)
from anndata._repr.html import generate_repr_html
from anndata._repr.utils import format_number


@register_formatter
class ModSectionFormatter(SectionFormatter):
    """SectionFormatter for MuData's .mod attribute."""

    section_name = "mod"
    priority = 200  # Show before other custom sections

    @property
    def after_section(self) -> str:
        return "X"  # Show right after X (before obs)

    @property
    def doc_url(self) -> str:
        return "https://mudata.readthedocs.io/en/latest/api/generated/mudata.MuData.html"

    @property
    def tooltip(self) -> str:
        return "Modalities (MuData)"

    def should_show(self, obj) -> bool:
        return hasattr(obj, "mod") and len(obj.mod) > 0

    def get_entries(self, obj, context: FormatterContext) -> list[FormattedEntry]:
        entries = []
        for mod_name, adata in obj.mod.items():
            shape_str = f"{format_number(adata.n_obs)} × {format_number(adata.n_vars)}"
            # Generate nested HTML for expandable content
            can_expand = context.depth < context.max_depth
            nested_html = None
            if can_expand:
                nested_html = generate_repr_html(
                    adata,
                    depth=context.depth + 1,
                    max_depth=context.max_depth,
                    show_header=True,
                    show_search=False,
                )
            output = FormattedOutput(
                type_name=f"AnnData ({shape_str})",
                css_class="dtype-anndata",
                tooltip=f"Modality: {mod_name}",
                html_content=nested_html,
                is_expandable=can_expand,
                is_serializable=True,
            )
            entries.append(FormattedEntry(key=mod_name, output=output))
        return entries


class MuData:
    def _repr_html_(self):
        # With the SectionFormatter registered, just call generate_repr_html directly!
        # All standard sections (obs, var, obsm, varm, uns) work automatically,
        # and .mod is rendered as expandable nested AnnData objects.
        return generate_repr_html(self)

This approach gives MuData:

  • Full reuse of anndata's CSS, JavaScript, and rendering logic
  • Automatic support for all standard sections
  • The .mod section rendered as expandable nested AnnData (just like nested AnnData in .uns)
  • All interactive features: search, fold/expand, category colors, warnings, etc.

I've added a visual test case demonstrating this in the PR.

Edit: I updated the demo to include 19. MuData (Multimodal Data).

Edit: Include doc_url in the example.

katosh avatar Dec 02 '25 18:12 katosh

Wow! Just a clarification: Which SpatialData I meant https://github.com/scverse/spatialdata and not storing spatial data in AnnData which is possible depending on the assay but something that we're moving away from.

Zethson avatar Dec 02 '25 21:12 Zethson

Thank you for the clarification, that makes perfect sense! I find SpatialData has a fundamentally different structure that just contains AnnData objects:

SpatialData object, with associated Zarr store: /Users/macbook/embl/projects/basel/spatialdata-sandbox/mouse_liver/data.zarr
├── Images
│     └── 'raw_image': DataTree[cyx] (1, 6432, 6432), (1, 1608, 1608)
├── Labels
│     └── 'segmentation_mask': DataArray[yx] (6432, 6432)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 3) (2D points)
├── Shapes
│     └── 'nucleus_boundaries': GeoDataFrame shape: (3375, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (3375, 99)
with coordinate systems:
    ▸ 'global', with elements:
        raw_image (Images), segmentation_mask (Labels), transcripts (Points), nucleus_boundaries (Shapes)

Supporting arbitrary, composite data structures like this might be beyond the intended scope of the anndata package and could call for a dedicated scverse visualization package. What would be the simplest solution for that?

That said, I've explored what this could look like: PR #2 with a live demo (see test case 20).

This introduces ObjectFormatter, a new extension point that gives packages like SpatialData full control over the HTML representation: customizing the header (no shape display), index preview (coordinate systems instead of obs/var names), skipping the X section, and showing custom sections with their own footer version. While reusing anndata's CSS, JavaScript, and interactive features like fold/expand, search, and dark mode.

However, this does add complexity to the backend. An alternative would be for SpatialData to implement its own _repr_html_ and use anndata's _repr_html_ to embed the nested AnnData tables.

katosh avatar Dec 03 '25 03:12 katosh

However, this does add complexity to the backend. An alternative would be for SpatialData to implement its own repr_html and use anndata's repr_html to embed the nested AnnData tables.

Yes, I think that'd be the way to go. Both MuData and SpatialData could reuse the components that might make its way into AnnData.

Zethson avatar Dec 03 '25 09:12 Zethson

Following up on the SpatialData discussion, I explored an alternative to ObjectFormatter: exporting building blocks so SpatialData can own its _repr_html_ while reusing anndata's styling.

settylab/anndata#3 exports building blocks (get_css, get_javascript, render_section, FormatterRegistry, etc.) so packages can build custom reprs with consistent styling.

Live demo – implements a test case 20 showing SpatialData with custom header, coordinate systems preview, and a "transforms" section representing a theoretical customization of SpatialData by a third package.

Trade-off is more code on SpatialData's side (~280 lines vs a formatter class), but less maintenance burden on anndata and maximum flexibility. Happy to discuss!

katosh avatar Dec 03 '25 20:12 katosh

The CI failures appear unrelated to this PR. They're in dask distributed tests on hatch-test.pre which pulls pandas 3.0.0rc0. Pandas 3.0 defaults to ArrowStringArray for strings, but anndata's IO registry doesn't seem to have a writer for this type yet.

Happy to re-trigger CI once there's another fix to bundle, or anyone can trigger it on demand.

katosh avatar Dec 04 '25 07:12 katosh

Thanks for this! We’ll probably get to this next week when Ilan is back from holiday.

Fix for the pandas stuff is in the pipes: https://github.com/scverse/anndata/pull/2133

flying-sheep avatar Dec 04 '25 12:12 flying-sheep

While experimenting with this PR, I noticed that anndata now has several extension mechanisms scattered across different locations (accessors in ad.register_anndata_namespace, HTML formatters in anndata._repr, I/O handlers in anndata._io.specs._REGISTRY). Following the pattern from pandas (pd.api.extensions) and xarray, I prototyped an anndata.extensions module that consolidates these into a single public namespace. This could eventually include the existing IORegistry for serialization registration (#2238), aligning with the roadmap (#448).

Potential modifications

I have a few modifications ready in my fork. Please let me know if any would be useful to include:

  1. ~~ObjectFormatter for arbitrary objects~~ (settylab/anndata#2) - Probably not a good idea

  2. Expose _repr_html_ building blocks (settylab/anndata#3)

    • Exports CSS/JS, rendering helpers, and UI components for external packages (SpatialData, MuData) to build their own _repr_html_
    • See PR for details and SpatialData example
  3. anndata.extensions module (settylab/anndata#4)

    • Consolidates accessors (#1870) and HTML formatters into a single public namespace following pandas/xarray patterns
    • Aligns with roadmap (#448) and could eventually include IORegistry (#2238)
    • See PR for details

Let me know if I should add any of these!

katosh avatar Dec 12 '25 15:12 katosh

Following up on settylab/anndata#3 in addition to exporting building blocks for SpatialData integration, I've added a few more features:

New functionality:

  • Regex search: Case-sensitive and regex toggles for filtering fields (building on the existing search box)
  • .raw section: Expandable row showing unprocessed data (X, var, varm), addressing #349
  • Robust error handling: Failed sections show visible error indicators instead of being silently hidden. This ensures transparency and users always know what data exists.
  • Extended serialization warnings (#1923, #567, #636, #1429, #1979): Building on existing value-level warnings, now also covers:
    • datetime64/timedelta64 (red): not serializable to H5AD/Zarr (#455, #2238)
    • Keys/names (red): non-string names like tuples - fails now (#321)
    • Keys/names (yellow): slashes - FutureWarning, will be disallowed (#1447, #2099)
    • All sections: layers, obsm, varm, obsp, varp, uns (not just obs/var columns)

I've added these features to settylab/anndata#3 rather than here, since that PR is intended to build on this one and adding them here directly could cause conflicts when merging.

Updated demo: https://htmlpreview.github.io/?https://gist.githubusercontent.com/katosh/93bf42b205d8dc68579c9eba7175047b/raw/repr_html_visual_test.html

  • Cases 21a-d: .raw variants (dense, sparse, minimal)
  • Cases 22a-b: Unknown section detection and error handling with real exceptions
  • Case 23: Serialization warnings - comprehensive test of all warning/error cases

Happy to rebase or split these into separate PRs if that helps!

Edit: Added "Extended serialization warnings"

katosh avatar Dec 15 '25 12:12 katosh