Blog icon indicating copy to clipboard operation
Blog copied to clipboard

batch effects

Open wangjiawen2013 opened this issue 6 years ago • 7 comments
trafficstars

Dear, in the Lukassen 2018 data, batch1 and batch2 do not align well using DCA (DCA on Lukassen.ipynb), while it seems to align the two mice quite well with scvi (scvi on Lukassen.ipynb)! which one should I use ?

wangjiawen2013 avatar Mar 10 '19 08:03 wangjiawen2013

Hi Jiawen,

Romain told me the reason it aligns well without batch correction in scVI is probably due to a size factor scaling scVI does.

I havn't used DCA much since the paper came out, but I use scVI almost every day. I don't remember if DCA has batch correction methods built in, but this is a feature of scVI that I find works very well.

vals avatar Mar 12 '19 20:03 vals

I am newcomer of scVI. I notice that your scvi pipeline is different from that of scVI basic tutorial (https://github.com/YosefLab/scVI/blob/master/tests/notebooks/basic_tutorial.ipynb). what's the difference ? Do you make any customized improvements to obtain better results ?

wangjiawen2013 avatar Mar 14 '19 03:03 wangjiawen2013

How do you mean? The only differences I can think of is that I store data in AnnData objects rather than GeneDatasets, and I use a different library for tSNE visualization.

vals avatar Mar 14 '19 04:03 vals

I mean the pipeline in this link "https://github.com/vals/Blog/tree/master/180420-scrna-autoencoders". In "https://github.com/vals/Blog/blob/master/181004-integrating-cortex-data/Integrate%20frontal%20cortex%20data.ipynb", the pipeline is the same as that in scVI basic tutorial.

wangjiawen2013 avatar Mar 14 '19 05:03 wangjiawen2013

Oh the post from last April used an old version of scVI that is deprecatred.

vals avatar Mar 14 '19 15:03 vals

Dear, do you know when to use gene/gene-batch/gene-label/gene-cell as the "param dispersion" in VAE ?

:param dispersion: One of the following
    * ``'gene'`` - dispersion parameter of NB is constant per gene across cells
    * ``'gene-batch'`` - dispersion can differ between different batches
    * ``'gene-label'`` - dispersion can differ between different labels
    * ``'gene-cell'`` - dispersion can differ for every gene in every cell

wangjiawen2013 avatar Mar 18 '19 06:03 wangjiawen2013

Hi,

I typically use gene-batch because I have noticed when analyzing data in general that the overdispersion trend when plotting mean-vs-variance for genes per batch it tends to be different per batch.

I haven't used the supervised mode of scVI much, so can't comment on the effect of gene-label. And the gene-cell option is interesting, but I haven't tried it much.

vals avatar Mar 18 '19 21:03 vals