mnnpy icon indicating copy to clipboard operation
mnnpy copied to clipboard

Reproducibility of batch correction

Open sophiamaedler opened this issue 6 years ago • 0 comments

I was wondering about the reproducibility of the batch correction and haven't found anything in the documentation. Ideally, I would like to be able to run a batch correction on a dataset in such a way that if I run it a second time with the same input I get reproducible results (so that my analysis can be recreated completely). So far when I have run the batch correction twice on a small toydataset and performed a clustering on it I get a different number of clusters each time I run it. To the background of my analysis: I use scanpy to load and process my data. I have set a random_state for all of the steps that rely on stochastic processes in scanpy. If I analyse the dataset several times without performing a batch correction then I get reproducible results (based on number of clusters detected, average gene expression within a cluster, etc.). If I however introduce a batch correction the results are not reproducible which led me to assume that the mnn_correct I was performing was the reason for the varying results. Are there any options that I missed in the documentation to be able to run the batch correction in a reproducible way? If not are there any plans to implement this in the future?

sophiamaedler avatar Dec 19 '18 11:12 sophiamaedler