adegenet
adegenet copied to clipboard
cross validation is slow
Hello,
Forgive the generality of the title. I have a DAPC cross validation running on a dataset of approx 400 samples x 5000 loci. The cross validation is testing values (n.pca.max
) between 50 and 300 in increments of 10. This process has been running on 30 cores for 77+ days on a Linux system as of today with no clear end in sight. Is there a way to make this faster? Perhaps there are alternatives I can try? The full call is:
xvalDapc(
tab(data, NA.method = "mean"),
grp = pop(data),
n.da = 3,
n.pca.max = seq(50,300,10),
n.rep = 20,
parallel = "multicore",
ncpus = 30
)
Unfortunately, this sounds like you’re running into an error rather than a slow process. I wouldn’t expect a single run of xvalDapc on a dataset of your size to take more than ~ 30 seconds.
The n.pca.max argument only expects a single value. I think this may be the source of your error. It’s designed to run DAPC with values of n.PCs selected from 1 to n.pca.max. So, if you chose n.pca.max=300, it should inherently explore the values in seq(50,300,10).
Could you try running the function with a single value for the n.pca.max argument (e.g., n.pca.max=300), to check if that produces results in a timely fashion?
You shouldn’t normally need to repeat the xvalDapc analysis with different values of n.pca.max. But if you have your own rationale for doing this, you would need to re-run the function in a loop, inputting only one value for n.pca.max in each iteration.
Wow, that explains a lot. I can't believe I waited this long to address this. Closing the issue, thank you!
I would recommend a sanity check for length(n.pca.max) == 1
to warn against something like this