vitessce-python
vitessce-python copied to clipboard
Demo data processing
This PR adds the demos/ directory, which is the next iteration of the vitessce-data repository.
Each subdirectory of demos contains a snakemake workflow to produce the vitessce-compatible files (e.g., AnnData-Zarr stores, OME-Zarr stores, CSV/JSON files).
The parent snakefile at demos/Snakefile can run all of the sub-snakefiles as snakemake subworkflows.
The parent snakefile can upload all processed files to AWS/GCP when --config upload=true is passed as a parameter to the snakemake command.
Each demo also contains a vitessce config template where URLs may contain {{ base_url }} which can be filled in by the fill_template.py script, with either local or remote (aws/gcp) URLs.
Then the processed files can be tested in Vitessce like
http://localhost:3000/?url=http://localhost:8000/codeluppi-2018/vitessce.local.jsonhttp://localhost:3000/?url=http://localhost:8000/codeluppi-2018/vitessce.remote.json
A few minor things about the "raw" files in the subworkflows:
- the raw AnnData h5ad file URLs from the cellxgene portal expire in 7 days, but in the comments above each of those URLs I added the link where a new file URL can be obtained
- to save time, the codeluppi, wang, and eng workflows start from the
vitessce-data-processed v0.0.31 files (not from the true "raw" files)
TODO:
- [x] move duplicated utility functions like
to_uint8andto_diamondinto a common python script that can be imported in each subworkflow
@keller-mark the URLs on the vitessce.io site don't work without this merged.
I am going to merge this so the URLs work for the tutorial. I will work to convert some of the comments to issues
thanks @keller-mark - sorry a bit busy and also getting out of the christmas break slumber!