higlass-server
higlass-server copied to clipboard
Store tileset metadata separately from tileset data
@sehilyi, @alexanderveit, @ngehlenborg and I had a discussion on Friday about improving support for tileset metadata (initially with the cistrome-higlass-wrapper use case but now with others in mind - vitessce, etc), so I wanted to create this issue to discuss with the entire development team
One idea is to specify a corresponding metadata file when tileset file is ingested.
Some open questions
- What types of fields need to be stored in the metadata file? For cistrome-higlass-wrapper, it will at least be:
- quantitative fields (bar chart)
- multiple related quantitative fields (stacked bar chart)
- categorical/nominal fields (e.g. cell type, tissue type, species)
- links/text (just a special case of the categorical/nominal?)
- hierarchy - right now we do the tree-to-matrix and matrix-to-tree thing, but if we are defining a new metadata storage format, we could also define "aggregated" metadata which can store the hierarchy as a tree data structure separately so that no conversion step is required
- For which axis is the metadata? How can this be specified in the metadata file?
- How should metadata be stored when it is associated with:
- points along a continuous axis
- categories along a categorical axis (the current cistrome-higlass-wrapper case)
- 1D intervals along a continuous axis
- 2D regions along two continuous axes
- Does metadata need to be aggregated? For example, does different metadata need to correspond to different track zoom levels?
- I could imagine aggregating the multivec data along the y/sample axis as well, such that for instance
- y-zoom-level 0 corresponds to displaying multivec rows by species: 2 rows are displayed, one for human and one for mouse
- y-zoom-level 1 corresponds to displaying multivec rows by tissue type for a particular species: more rows are displayed, for all tissue types within the species
- y-zoom-level 2 corresponds to displaying multivec rows by cell type for a particular species and tissue type: more rows are displayed, for all cell types within the tissue type
- I could imagine aggregating the multivec data along the y/sample axis as well, such that for instance
- In what file format should metadata be stored (json, csv, etc)? How flexible does this format need to be, and does a schema need to be defined? There obviously would need to be a schema if there is some aggregation going on, so that the server can parse and return different metadata based on query parameters. But if not, then there may not need to be any schema if neither higlass-server nor higlass ever needs to look at the data, and only these "wrapper" applications are using the data.
- From which server API endpoint will the metadata be served? (Right now some metadata is served from the /tileset_info endpoint)
I think I'm missing some context here. What are you trying to accomplish with all this metadata?