Add HTML representation
Rich HTML representation for AnnData
- [x] Closes #675
- [x] Tests added
- [x] Release note added (or unnecessary)
Summary
Implements rich HTML representation (_repr_html_) for AnnData objects in Jupyter notebooks and other HTML-aware environments. This builds on previous draft PRs (#784, #694, #521, #346) with a complete, production-ready implementation.
Screenshot
Features
Interactive Display
- Foldable sections: Each attribute (obs, var, uns, etc.) is collapsible with auto-fold for sections exceeding threshold
- Search/filter: Real-time filtering across all fields by name, type, or content
- Copy-to-clipboard: One-click copy for field names
- Nested AnnData: Expandable nested objects with configurable depth limit
Visual Indicators
- Category colors: Displays category values with their color palette from
uns(e.g.,cell_type_colors) - Type badges: Visual indicators for views, backed mode, sparse matrices, Dask arrays
- Serialization warnings: Highlights fields that won't serialize to H5AD/Zarr
- Value previews: Inline previews for simple uns values (strings, numbers, small dicts/lists)
- README support: If
uns["README"]contains a string, a small ⓘ icon appears in the header; clicking it opens a modal with the content rendered as markdown (headers, bold/italic, code blocks, ordered/ unordered lists, links, blockquotes, tables) - Memory info: Shows estimated memory usage in footer
Compatibility
- Dark mode: Auto-detects Jupyter Lab/VS Code themes
- No-JS fallback: Graceful degradation when JavaScript is disabled
- JupyterLab safe: CSS scoped to
.anndata-reprprevents style conflicts
Configuration
import anndata
anndata.settings.repr_html_enabled = True # Enable/disable HTML repr (default: True)
anndata.settings.repr_html_fold_threshold = 5 # Auto-fold sections with more items (default: 5)
anndata.settings.repr_html_max_depth = 3 # Nested AnnData depth limit (default: 3)
anndata.settings.repr_html_max_items = 200 # Max items per section (default: 200)
anndata.settings.repr_html_max_categories = 100 # Max categories inline (default: 100)
anndata.settings.repr_html_dataframe_expand = False # Expandable DataFrame tables (default: False)
anndata.settings.repr_html_max_field_width = 400 # Max width for field name column (default: 400px)
anndata.settings.repr_html_type_width = 220 # Width for type column (default: 220px)
Extensibility
The repr system provides two extension points: TypeFormatter for custom value visualization and SectionFormatter for adding new sections.
TypeFormatter - Format by Python type (e.g., custom array in obsm/varm):
from anndata._repr import register_formatter, TypeFormatter, FormattedOutput
@register_formatter
class MyArrayFormatter(TypeFormatter):
sections = ("obsm", "varm") # Only apply to these sections (None = all)
def can_format(self, obj):
return isinstance(obj, MyArrayType)
def format(self, obj, context):
return FormattedOutput(
type_name=f"MyArray {obj.shape}",
html_content=obj._repr_html_(), # Custom visualization
is_expandable=True,
)
TypeFormatter - Format by embedded type hint (e.g., tagged data in uns):
from anndata._repr import register_formatter, TypeFormatter, FormattedOutput
from anndata._repr import extract_uns_type_hint
@register_formatter
class MyConfigFormatter(TypeFormatter):
priority = 100 # Check before fallback
sections = ("uns",) # Only apply to uns section
def can_format(self, obj):
hint, _ = extract_uns_type_hint(obj)
return hint == "mypackage.config"
def format(self, obj, context):
hint, data = extract_uns_type_hint(obj)
return FormattedOutput(
type_name="config",
html_content='<span>Custom preview</span>',
)
Data with type hints ({"__anndata_repr__": "mypackage.config", ...}) will use the registered formatter when the package is imported. Security: data never triggers code execution; packages must be explicitly imported first.
SectionFormatter - Add new sections (e.g., TreeData's obst/vart):
from anndata._repr import register_formatter, SectionFormatter, FormattedEntry, FormattedOutput
@register_formatter
class ObstSectionFormatter(SectionFormatter):
section_name = "obst"
after_section = "obsm" # Position after obsm
def should_show(self, obj):
return hasattr(obj, "obst") and len(obj.obst) > 0
def get_entries(self, obj, context):
return [
FormattedEntry(
key=key,
output=FormattedOutput(
type_name=f"Tree ({tree.n_nodes} nodes)",
html_content=tree._repr_svg_(), # Custom tree SVG
is_expandable=True,
)
)
for key, tree in obj.obst.items()
]
Testing
- 251 unit tests with 90-100% coverage on
_reprmodule - 18 visual test cases in
python tests/visual_inspect_repr_html.py
Implementation
New module (src/anndata/_repr/):
__init__.py- Public API and documentationhtml.py- Main HTML generatorcss.py- Scoped CSS with dark mode supportjavascript.py- Interactive features (fold, search, copy)markdown.py- Minimal JS markdown parser for README rendering (easily replaceable by external module)formatters.py- Type-specific formattersregistry.py- Extensible TypeFormatter and SectionFormatter registrationutils.py- Helper functionsconstants.py- Default values for settings (single source of truth)
Modified files:
src/anndata/_core/anndata.py- Added_repr_html_()methodsrc/anndata/_settings.py- Addedrepr_html_*settingssrc/anndata/_settings.pyi- Added type stubs forrepr_html_*settingspyproject.toml- Addedvartto codespell ignore list, updated pytest version
Documentation
Documentation for extending the HTML repr is in the module docstrings:
src/anndata/_repr/__init__.py- Main documentation with complete examples for:TypeFormatter- Custom visualization for specific types (by Python type or embedded type hints)SectionFormatter- Adding new sections (like TreeData'sobst/vart)FormattedOutputandFormattedEntrydataclassesextract_uns_type_hint()for tagged data patterns
src/anndata/_repr/registry.py- Detailed API documentation for the registry systemsrc/anndata/_core/anndata.py-_repr_html_()method docstring with settings reference
Demo
Live interactive demo | Gist source
To regenerate locally: python tests/visual_inspect_repr_html.py
Config Changes
pyproject.toml: Added vart to codespell's ignore-words-list. This is needed because vart (variable tree annotations) is a valid section name in the TreeData extension package, analogous to obst (observation tree annotations). The documentation and examples reference this term.
Dependency Changes
pyproject.toml: Updated pytest>=8.2 to pytest>=9.0 in test-min to resolve dependency conflict with hatch-test's built-in pytest~=9.0 requirement.
Related
- Supersedes #784, #694, #521, #346 (previous drafts)
- Compatible with #1927 (sparse scipy changes)
- Fully backward compatible
Acknowledgments
Thanks to @selmanozleyen (#784), @gtca (#694), @VolkerH (#521), @ivirshup (#346, #675), and @Zethson (#675) for the prior work and discussions that informed this implementation.
Hi all!
I believe the failing CI checks are due to pre-existing anndata CI configuration issues rather than this PR:
- Test failures (hatch-test.min, hatch-test.stable, hatch-test.pre): All fail during dependency installation with a pytest version conflict (pytest>=9.0,<10.dev0 and pytest==8.2 unsatisfiable).
- Triage checks: Needs a milestone assigned by a maintainer.
- ReadTheDocs: I built the docs locally with sphinx-build -W (warnings as errors) and the 23 warnings that cause failure are all pre-existing issues - broken links to zarr docs and missing tutorial notebooks. None are related to the
_reprmodule. The new code doesn't introduce any new warnings.
Please let me know if I'm wrong about any of this and there are changes I should make to the PR. Happy to fix anything needed!
Codecov Report
:x: Patch coverage is 94.54685% with 71 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 86.21%. Comparing base (4f64868) to head (7c7058b).
:white_check_mark: All tests successful. No failed tests found.
Additional details and impacted files
@@ Coverage Diff @@
## main #2236 +/- ##
==========================================
+ Coverage 84.47% 86.21% +1.74%
==========================================
Files 46 55 +9
Lines 7202 8504 +1302
==========================================
+ Hits 6084 7332 +1248
- Misses 1118 1172 +54
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/anndata/_repr/__init__.py | 100.00% <100.00%> (ø) |
|
| src/anndata/_repr/constants.py | 100.00% <100.00%> (ø) |
|
| src/anndata/_repr/css.py | 100.00% <100.00%> (ø) |
|
| src/anndata/_repr/javascript.py | 100.00% <100.00%> (ø) |
|
| src/anndata/_repr/markdown.py | 100.00% <100.00%> (ø) |
|
| src/anndata/_settings.py | 92.34% <100.00%> (+2.75%) |
:arrow_up: |
| src/anndata/_core/anndata.py | 83.08% <75.00%> (-0.09%) |
:arrow_down: |
| src/anndata/_repr/registry.py | 94.57% <94.57%> (ø) |
|
| src/anndata/_repr/utils.py | 93.54% <93.54%> (ø) |
|
| src/anndata/_repr/formatters.py | 95.05% <95.05%> (ø) |
|
| ... and 1 more |
Bumped pytest>=8.2 to pytest>=9.0 in test-min to resolve dependency conflict with hatch-test's pinned pytest~=9.0.
Attempting to trigger codecov to refresh its report with actual coverage: @codecov refresh
Thank you so much for this massive body of work!
I saw your post on discourse and just wanted to inform you real quick that both of our main anndata maintainers are currently on holidays. But I'm sure that they'll get back to you soon.
Thanks!
Oh, thanks so much for the update @Zethson! Totally understand, and no rush at all. This one is a bit larger, and I figured it might take some time. I just wanted to signal that I’m very open to discussing any aspect of it (functionality, features, aesthetics), and I’m happy to iterate and incorporate any ideas. I’ll keep working on it until it’s in a shape everyone feels good about.
Thanks again!
I would be happy if we could come up with a canonical design and components that we could reuse for both MuData and SpatialData for an ideally consistent experience.
This might be beyond the scope of this PR but since you were asking for feedback I thought I'd mention it nonetheless.
Thank you for the feedback, I really appreciate it!
Extensibility via Formatters
To accommodate new modalities like spatial data, I implemented two extension points:
SectionFormatter- Define visualization of new sections (beyond.obs,.obsm, etc.)TypeFormatter- Customize the appearance of specific data types within any existing section
These formatters are registered when a package is imported. The live demo illustrates this for the treedata extension, which introduces .obst and .vart sections for tree annotations.
MuData Support
I've been thinking about outer wrappers like MuData. While packages that use AnnData internally would generally need to build their own HTML repr, MuData is a special case: it mimics AnnData's structure with shared sections (.obs, .var, .obsm, .varm, .uns) plus an additional .mod section containing the modality AnnData objects.
Because of this structural similarity, MuData can reuse the existing implementation directly by registering a SectionFormatter for .mod:
# In mudata package
from anndata._repr import (
FormattedEntry,
FormattedOutput,
FormatterContext,
SectionFormatter,
register_formatter,
)
from anndata._repr.html import generate_repr_html
from anndata._repr.utils import format_number
@register_formatter
class ModSectionFormatter(SectionFormatter):
"""SectionFormatter for MuData's .mod attribute."""
section_name = "mod"
priority = 200 # Show before other custom sections
@property
def after_section(self) -> str:
return "X" # Show right after X (before obs)
@property
def doc_url(self) -> str:
return "https://mudata.readthedocs.io/en/latest/api/generated/mudata.MuData.html"
@property
def tooltip(self) -> str:
return "Modalities (MuData)"
def should_show(self, obj) -> bool:
return hasattr(obj, "mod") and len(obj.mod) > 0
def get_entries(self, obj, context: FormatterContext) -> list[FormattedEntry]:
entries = []
for mod_name, adata in obj.mod.items():
shape_str = f"{format_number(adata.n_obs)} × {format_number(adata.n_vars)}"
# Generate nested HTML for expandable content
can_expand = context.depth < context.max_depth
nested_html = None
if can_expand:
nested_html = generate_repr_html(
adata,
depth=context.depth + 1,
max_depth=context.max_depth,
show_header=True,
show_search=False,
)
output = FormattedOutput(
type_name=f"AnnData ({shape_str})",
css_class="dtype-anndata",
tooltip=f"Modality: {mod_name}",
html_content=nested_html,
is_expandable=can_expand,
is_serializable=True,
)
entries.append(FormattedEntry(key=mod_name, output=output))
return entries
class MuData:
def _repr_html_(self):
# With the SectionFormatter registered, just call generate_repr_html directly!
# All standard sections (obs, var, obsm, varm, uns) work automatically,
# and .mod is rendered as expandable nested AnnData objects.
return generate_repr_html(self)
This approach gives MuData:
- Full reuse of anndata's CSS, JavaScript, and rendering logic
- Automatic support for all standard sections
- The
.modsection rendered as expandable nested AnnData (just like nested AnnData in.uns) - All interactive features: search, fold/expand, category colors, warnings, etc.
I've added a visual test case demonstrating this in the PR.
Edit: I updated the demo to include 19. MuData (Multimodal Data).
Edit: Include doc_url in the example.
Wow! Just a clarification: Which SpatialData I meant https://github.com/scverse/spatialdata and not storing spatial data in AnnData which is possible depending on the assay but something that we're moving away from.
Thank you for the clarification, that makes perfect sense! I find SpatialData has a fundamentally different structure that just contains AnnData objects:
SpatialData object, with associated Zarr store: /Users/macbook/embl/projects/basel/spatialdata-sandbox/mouse_liver/data.zarr
├── Images
│ └── 'raw_image': DataTree[cyx] (1, 6432, 6432), (1, 1608, 1608)
├── Labels
│ └── 'segmentation_mask': DataArray[yx] (6432, 6432)
├── Points
│ └── 'transcripts': DataFrame with shape: (<Delayed>, 3) (2D points)
├── Shapes
│ └── 'nucleus_boundaries': GeoDataFrame shape: (3375, 1) (2D shapes)
└── Tables
└── 'table': AnnData (3375, 99)
with coordinate systems:
▸ 'global', with elements:
raw_image (Images), segmentation_mask (Labels), transcripts (Points), nucleus_boundaries (Shapes)
Supporting arbitrary, composite data structures like this might be beyond the intended scope of the anndata package and could call for a dedicated scverse visualization package. What would be the simplest solution for that?
That said, I've explored what this could look like: PR #2 with a live demo (see test case 20).
This introduces ObjectFormatter, a new extension point that gives packages like SpatialData full control over the HTML representation: customizing the header (no shape display), index preview (coordinate systems instead of obs/var names), skipping the X section, and showing custom sections with their own footer version. While reusing anndata's CSS, JavaScript, and interactive features like fold/expand, search, and dark mode.
However, this does add complexity to the backend. An alternative would be for SpatialData to implement its own _repr_html_ and use anndata's _repr_html_ to embed the nested AnnData tables.
However, this does add complexity to the backend. An alternative would be for SpatialData to implement its own repr_html and use anndata's repr_html to embed the nested AnnData tables.
Yes, I think that'd be the way to go. Both MuData and SpatialData could reuse the components that might make its way into AnnData.
Following up on the SpatialData discussion, I explored an alternative to ObjectFormatter: exporting building blocks so SpatialData can own its _repr_html_ while reusing anndata's styling.
settylab/anndata#3 exports building blocks (get_css, get_javascript, render_section, FormatterRegistry, etc.) so packages can build custom reprs with consistent styling.
Live demo – implements a test case 20 showing SpatialData with custom header, coordinate systems preview, and a "transforms" section representing a theoretical customization of SpatialData by a third package.
Trade-off is more code on SpatialData's side (~280 lines vs a formatter class), but less maintenance burden on anndata and maximum flexibility. Happy to discuss!
The CI failures appear unrelated to this PR. They're in dask distributed tests on hatch-test.pre which pulls pandas 3.0.0rc0. Pandas 3.0 defaults to ArrowStringArray for strings, but anndata's IO registry doesn't seem to have a writer for this type yet.
Happy to re-trigger CI once there's another fix to bundle, or anyone can trigger it on demand.
Thanks for this! We’ll probably get to this next week when Ilan is back from holiday.
Fix for the pandas stuff is in the pipes: https://github.com/scverse/anndata/pull/2133
While experimenting with this PR, I noticed that anndata now has several extension mechanisms scattered across different locations (accessors in ad.register_anndata_namespace, HTML formatters in anndata._repr, I/O handlers in anndata._io.specs._REGISTRY). Following the pattern from pandas (pd.api.extensions) and xarray, I prototyped an anndata.extensions module that consolidates these into a single public namespace. This could eventually include the existing IORegistry for serialization registration (#2238), aligning with the roadmap (#448).
Potential modifications
I have a few modifications ready in my fork. Please let me know if any would be useful to include:
-
~~
ObjectFormatterfor arbitrary objects~~ (settylab/anndata#2) - Probably not a good idea -
Expose
_repr_html_building blocks (settylab/anndata#3)- Exports CSS/JS, rendering helpers, and UI components for external packages (SpatialData, MuData) to build their own
_repr_html_ - See PR for details and SpatialData example
- Exports CSS/JS, rendering helpers, and UI components for external packages (SpatialData, MuData) to build their own
-
anndata.extensionsmodule (settylab/anndata#4)- Consolidates accessors (#1870) and HTML formatters into a single public namespace following pandas/xarray patterns
- Aligns with roadmap (#448) and could eventually include
IORegistry(#2238) - See PR for details
Let me know if I should add any of these!
Following up on settylab/anndata#3 in addition to exporting building blocks for SpatialData integration, I've added a few more features:
New functionality:
- Regex search: Case-sensitive and regex toggles for filtering fields (building on the existing search box)
.rawsection: Expandable row showing unprocessed data (X, var, varm), addressing #349- Robust error handling: Failed sections show visible error indicators instead of being silently hidden. This ensures transparency and users always know what data exists.
- Extended serialization warnings (#1923, #567, #636, #1429, #1979): Building on existing value-level warnings, now also covers:
- datetime64/timedelta64 (red): not serializable to H5AD/Zarr (#455, #2238)
- Keys/names (red): non-string names like tuples - fails now (#321)
- Keys/names (yellow): slashes - FutureWarning, will be disallowed (#1447, #2099)
- All sections: layers, obsm, varm, obsp, varp, uns (not just obs/var columns)
I've added these features to settylab/anndata#3 rather than here, since that PR is intended to build on this one and adding them here directly could cause conflicts when merging.
Updated demo: https://htmlpreview.github.io/?https://gist.githubusercontent.com/katosh/93bf42b205d8dc68579c9eba7175047b/raw/repr_html_visual_test.html
- Cases 21a-d:
.rawvariants (dense, sparse, minimal) - Cases 22a-b: Unknown section detection and error handling with real exceptions
- Case 23: Serialization warnings - comprehensive test of all warning/error cases
Happy to rebase or split these into separate PRs if that helps!
Edit: Added "Extended serialization warnings"