brainglobe-atlasapi icon indicating copy to clipboard operation
brainglobe-atlasapi copied to clipboard

BrainGlobe Atlas API Version 2

Open adamltyson opened this issue 1 year ago • 5 comments

This is essentially a reply to https://github.com/brainglobe/bg-atlasapi/issues/96, but I'm starting a new issue to track this idea. Sorry for the long post, but interested in your ideas, @brainglobe/maintainers @brainglobe/swc-neuroinformatics @brainglobe/czi_eoss5.

After 2.5 years, and based on conversations with various users and atlas creators, I think it's time for bg-atlasapi version 2. Version 1 works very well for "classical" anatomical atlases (i.e. one reference image, and one annotation), but it doesn't cater well (or at all) for:

  • Atlases with multiple reference images (e.g. kim_dev_mouse and mpin_zfish)
  • Atlases that exist at multiple resolutions (the atlas API doesn't link these in any way)
  • Atlases with the same annotations in different coordinate spaces (e.g. allen_mouse and perens_lsfm_mouse)
  • Atlases that aren't represented by brain regions (e.g cell atlases)
  • Atlases (or coordinate spaces) that incorporate other data (e.g. tracing, gene expression)
  • Atlases being updated (as discussed above)

The atlas generation process also needs streamlining.

My idea for V2:

Move away from the monolithic atlas structure

The atlas could be defined by a config file, specifying atlas "elements" (an element being reference image, set of meshes etc):

 name: "allen_mouse"
 atlas_link: "http://www.brain-map.org"
 version: 2.0
...
...
reference_images: 
 # could be >1
  STP:  some_url
annotation_images: 
 # could be >1
  CCFv3:  some_url
structures: some_url
meshes: some_url

When the atlas was downloaded, the atlas API would check to see which of these existing files had been downloaded, and then only download those required. The idea is that there would be a lot of overlap between atlases (same meshes at different resolutions, same reference image for multiple annotations etc.), and this would reduce download times and save disk space.

This would also allow data to be stored somewhere other than GIN. I'm not sure whether we want to do this, but it may be necessary for e.g. larger atlases (see below).

Improve versioning

Essentially as per https://github.com/brainglobe/bg-atlasapi/issues/96. We could version the elements individually, and a versioned atlas could specify these, e.g.:

 name: "allen_mouse"
 atlas_link: "http://www.brain-map.org"
 version: 1.0
...
...
reference_images: 
  STP:  [email protected]
annotation_images: 
  CCFv3:  [email protected]
structures: [email protected]
meshes: [email protected]

Improve the atlas generation process

I think the PR to bg-atlasgen has worked ok, but the repo itself needs a lot of refactoring to improve it. Submitting a new atlas could become more complex though, if the user is supposed to select which pre-existing atlas elements can be re-used. We end up spending a lot of time on these pull requests, so maybe we could:

  • Provide a form (possibly an issue template) that asks for all the relevant information.
  • Develop tooling to go from form to config file.
  • Submit the PR ourselves?
  • Develop tooling to rigorously check a new atlas against BrainGlobe tools (especially all parts of the API).

Introduce "relationships" between atlases

Lots of atlases are related in some way, e.g.:

  • They are the same atlas at different resolutions
  • They are the same annotations las with different reference images
  • They are a version of another atlas in a different coordinate space

It would be useful to introduce two concepts:

  • Grouping (easy) - make it obvious in e.g. the CLI and website that different atlases are related
  • Transforms (hard) - store transforms from one atlas to another in a standardised form, and provide methods to transform between them. For the different resolutions, this is possible, but not all the other atlases have a transform between them. We would need transforms for various types of data (e.g. images & objects). This may end up being for V3.

Include additional data

There are different types of atlas other than just brain regions (e.g. cell atlases). There is also a lot of publicly available data that is registered to an atlas (e.g. tracing data, gene expression). These data are as much an "atlas" as the brain region ones. I propose adding additional "elements" to cater to this. These elements could either be added to an existing atlas, or a new atlas could be created (without necessarily any annotation image, just the reference image to define the coordinate space). In some cases, this may include duplicating some functionality of morphapi. There are a lot of questions here about exactly what data to support and how to standardise it.

Questions

Do we want to support data stored elsewhere?

My gut feeling is that in general BrainGlobe should ensure the validity of all atlases. However, for some atlases (e.g. bigger ones) maybe we want to allow hosting of files elsewhere and maybe mark them with a community tag or similar? This could also simplify the support for lab/project-specific atlases that we may not want to become a "proper" BG atlas.

Should we support lazy loading for large atlases?

Some atlases are becoming very large (e.g. EM). We don't want to re-package these ourselves, and we definately don't want to download them locally. We could support N apis for lazy loading to support these type of atlases. I assume these atlases will only become more common, but we may not want to support them at all.

adamltyson avatar Feb 23 '23 16:02 adamltyson