scvi-tutorials add preprocessing tutorial with multiple examples

Formatting

[x] My tutorial has only one top-level (#) header

Reproducibility

[x] My tutorial works on Google Colab
[x] My tutorial sets scvi.settings.seed = 0 at the beginning of the notebook
[x] My tutorial has been run and includes outputs (e.g. plots, tables)

Other

[x] Counts and normalized data should co-exist in the datasets, see the API overview for an example
[x] For scRNA-seq data, normalization should be counts per median library size and then log1p transformed -- if not, a reason should be given

Feb 11 '25 02:02 lordy5

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Feb 11 '25 02:02 review-notebook-app[bot]

I still need to link to the preprocessing tutorial from the other relevant tutorials and remove the preprocessing sections from those, but first want to see if anything needs to be changed/added to the preprocessing tutorial.

Mar 01 '25 05:03 lordy5

Please run it with the most recent scvi-tools version (which is now 1.3v) . Also use more TODO's on code so I can find the questions more easily.

For the concatenation of 2 datasets, I think you meant the old anndata preprocessing part, where the function pbmcs_10x_cite_seq downloads 2 adata , do preprocessing to them and concatenates them?. I think the preprocess tutorial should replace that part now, no? so we will have just one place to download the already ready mudata from? In such case we will download only 1 file that is already preprocessed, it can have the same batch column like before, so we expect the same results but it can also have other columns as batch key

Mar 02 '25 09:03 ori-kron-wis

@ori-kron-wis Which tutorials should I remove the preprocessing sections from, now that there is the preprocessing notebook? I was thinking of removing it from the tutorials whose exact datasets I use in the preprocessing notebook, and keeping it for the others, but then still linking the preprocessing tutorial in all of them, so users know how to use their own datasets.

Mar 12 '25 16:03 lordy5