brainglobe-atlasapi
brainglobe-atlasapi copied to clipboard
BrainGlobe Atlas API Version 2
This is essentially a reply to https://github.com/brainglobe/bg-atlasapi/issues/96, but I'm starting a new issue to track this idea. Sorry for the long post, but interested in your ideas, @brainglobe/maintainers @brainglobe/swc-neuroinformatics @brainglobe/czi_eoss5.
After 2.5 years, and based on conversations with various users and atlas creators, I think it's time for bg-atlasapi
version 2. Version 1 works very well for "classical" anatomical atlases (i.e. one reference image, and one annotation), but it doesn't cater well (or at all) for:
- Atlases with multiple reference images (e.g.
kim_dev_mouse
andmpin_zfish
) - Atlases that exist at multiple resolutions (the atlas API doesn't link these in any way)
- Atlases with the same annotations in different coordinate spaces (e.g.
allen_mouse
andperens_lsfm_mouse
) - Atlases that aren't represented by brain regions (e.g cell atlases)
- Atlases (or coordinate spaces) that incorporate other data (e.g. tracing, gene expression)
- Atlases being updated (as discussed above)
The atlas generation process also needs streamlining.
My idea for V2:
Move away from the monolithic atlas structure
The atlas could be defined by a config file, specifying atlas "elements" (an element being reference image, set of meshes etc):
name: "allen_mouse"
atlas_link: "http://www.brain-map.org"
version: 2.0
...
...
reference_images:
# could be >1
STP: some_url
annotation_images:
# could be >1
CCFv3: some_url
structures: some_url
meshes: some_url
When the atlas was downloaded, the atlas API would check to see which of these existing files had been downloaded, and then only download those required. The idea is that there would be a lot of overlap between atlases (same meshes at different resolutions, same reference image for multiple annotations etc.), and this would reduce download times and save disk space.
This would also allow data to be stored somewhere other than GIN. I'm not sure whether we want to do this, but it may be necessary for e.g. larger atlases (see below).
Improve versioning
Essentially as per https://github.com/brainglobe/bg-atlasapi/issues/96. We could version the elements individually, and a versioned atlas could specify these, e.g.:
name: "allen_mouse"
atlas_link: "http://www.brain-map.org"
version: 1.0
...
...
reference_images:
STP: [email protected]
annotation_images:
CCFv3: [email protected]
structures: [email protected]
meshes: [email protected]
Improve the atlas generation process
I think the PR to bg-atlasgen
has worked ok, but the repo itself needs a lot of refactoring to improve it. Submitting a new atlas could become more complex though, if the user is supposed to select which pre-existing atlas elements can be re-used. We end up spending a lot of time on these pull requests, so maybe we could:
- Provide a form (possibly an issue template) that asks for all the relevant information.
- Develop tooling to go from form to config file.
- Submit the PR ourselves?
- Develop tooling to rigorously check a new atlas against BrainGlobe tools (especially all parts of the API).
Introduce "relationships" between atlases
Lots of atlases are related in some way, e.g.:
- They are the same atlas at different resolutions
- They are the same annotations las with different reference images
- They are a version of another atlas in a different coordinate space
It would be useful to introduce two concepts:
- Grouping (easy) - make it obvious in e.g. the CLI and website that different atlases are related
- Transforms (hard) - store transforms from one atlas to another in a standardised form, and provide methods to transform between them. For the different resolutions, this is possible, but not all the other atlases have a transform between them. We would need transforms for various types of data (e.g. images & objects). This may end up being for V3.
Include additional data
There are different types of atlas other than just brain regions (e.g. cell atlases). There is also a lot of publicly available data that is registered to an atlas (e.g. tracing data, gene expression). These data are as much an "atlas" as the brain region ones. I propose adding additional "elements" to cater to this. These elements could either be added to an existing atlas, or a new atlas could be created (without necessarily any annotation image, just the reference image to define the coordinate space). In some cases, this may include duplicating some functionality of morphapi
. There are a lot of questions here about exactly what data to support and how to standardise it.
Questions
Do we want to support data stored elsewhere?
My gut feeling is that in general BrainGlobe should ensure the validity of all atlases. However, for some atlases (e.g. bigger ones) maybe we want to allow hosting of files elsewhere and maybe mark them with a community
tag or similar? This could also simplify the support for lab/project-specific atlases that we may not want to become a "proper" BG atlas.
Should we support lazy loading for large atlases?
Some atlases are becoming very large (e.g. EM). We don't want to re-package these ourselves, and we definately don't want to download them locally. We could support N apis for lazy loading to support these type of atlases. I assume these atlases will only become more common, but we may not want to support them at all.