spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

User documentation on how to setup jobs or run large analysis

Open berombau opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe. Current documentation shows small examples. Working on real large datasets varies in some ways and has specific needs:

  • limiting Dask workers to limit memory usage
  • job-based instead of interactive
  • sometimes needs to use a specific HPC setup
  • requesting resources via SLURM or using workflow managers like Nextflow, Snakemake...
  • working with a distributed Dask cluster

Describe the solution you'd like A documentation page should explain this and link to existing resources. It would also be interesting to gather existing documentation of executing large jobs with SpatialData.

Some resources:

  • Dask cluster:
    • https://docs.dask.org/en/stable/deploying-python.html?#localcluster
    • https://docs.dask.org/en/latest/scheduling.html
    • https://docs.dask.org/en/latest/deploying-hpc.html
    • https://docs.dask.org/en/latest/deploying.html#advanced-understanding
    • https://jobqueue.dask.org/en/latest/
  • Developing with Python environments on HPC: https://docs.hpc.ugent.be/Linux/setting_up_python_virtual_environments/?h=venv
  • SpatialData workflows on HPC:
    • Hydra: https://harpy.readthedocs.io/en/latest/tutorials/hpc/index.html
    • Nextflow:
      • https://github.com/LucaMarconato/spatialdata-mcmicro
      • https://nf-co.re/configs/vsc_ugent
    • Snakemake: https://gustaveroussy.github.io/sopa/tutorials/snakemake/

berombau avatar Nov 07 '24 09:11 berombau

Some new notebook ideas:

  • [ ] intermediate notebook on limiting workers when segmenting with map apply https://docs.dask.org/en/stable/deploying-python.html?#localcluster
  • [ ] advanced notebook on setting up distributed Dask cluster https://jobqueue.dask.org/en/latest/runners-overview.html
  • [ ] working with Dask dashboard
    • [ ] working with Dask span for fine performance metrics

berombau avatar Nov 13 '24 13:11 berombau