adegenet cross validation is slow

Hello,

Forgive the generality of the title. I have a DAPC cross validation running on a dataset of approx 400 samples x 5000 loci. The cross validation is testing values (n.pca.max) between 50 and 300 in increments of 10. This process has been running on 30 cores for 77+ days on a Linux system as of today with no clear end in sight. Is there a way to make this faster? Perhaps there are alternatives I can try? The full call is:

xvalDapc(
    tab(data, NA.method = "mean"),
    grp = pop(data),
    n.da = 3,
    n.pca.max = seq(50,300,10),
    n.rep = 20,
    parallel = "multicore",
    ncpus = 30
 )

Feb 03 '22 15:02 pdimens

Unfortunately, this sounds like you’re running into an error rather than a slow process. I wouldn’t expect a single run of xvalDapc on a dataset of your size to take more than ~ 30 seconds.

The n.pca.max argument only expects a single value. I think this may be the source of your error. It’s designed to run DAPC with values of n.PCs selected from 1 to n.pca.max. So, if you chose n.pca.max=300, it should inherently explore the values in seq(50,300,10).

Could you try running the function with a single value for the n.pca.max argument (e.g., n.pca.max=300), to check if that produces results in a timely fashion?

You shouldn’t normally need to repeat the xvalDapc analysis with different values of n.pca.max. But if you have your own rationale for doing this, you would need to re-run the function in a loop, inputting only one value for n.pca.max in each iteration.

Feb 03 '22 17:02 caitiecollins

Wow, that explains a lot. I can't believe I waited this long to address this. Closing the issue, thank you!

Feb 03 '22 18:02 pdimens

I would recommend a sanity check for length(n.pca.max) == 1 to warn against something like this

Feb 03 '22 20:02 pdimens

adegenet adegenet copied to clipboard

cross validation is slow

adegenet
adegenet copied to clipboard