adegenet icon indicating copy to clipboard operation
adegenet copied to clipboard

DAPC returning 'k' rather than 'k-1' discriminant functions?

Open cizydorczyk opened this issue 6 years ago • 2 comments

Hello,

When I run the dapc() function on a SNP dataset of ~ 100k SNPs split into 31 clusters, at the stage where I get asked how many discriminant functions to retain, if I keep all then I am keeping 31.

Isn't DAPC (by definition) supposed to return k-1 (31-1) discriminant functions? How is it possible that the function returns k (31)?

I run dapc() like so: dapc.100 <- dapc(snps, hcpc.pca1.clusters.factor, pca.select="percVar", perc.pca=100)

where hcpc.pca1.clusters.factor is a factor containing my isolate groupings (31 clusters).

When I try running DAPC with one of the sample datasets from the adegenet R package (eg. dapcIllus$a), however, I get the proper number (5) of discriminant functions (since k=6 in that sample dataset).

Is it at all possible that under certain conditions (eg. given a particular dataset), that the DAPC function will return k discriminant functions? I thought by definition there would only be k-1 such functions if using k population clusters.

Any help resolving this issue is much appreciated.

Thank you, Conrad Izydorczyk

cizydorczyk avatar Dec 15 '17 20:12 cizydorczyk

Hi Conrad,

You are correct that you should be getting 30 axes instead of 31, however it's difficult to know why without a reproducible example (https://stackoverflow.com/a/5963610/2752888) or even knowing what versions of R and adegenet you have.

A few questions that may help get to the root problem:

  • Does this behavior occur if you subset your data to a smaller number of groups?
  • Does this behavior occur if you subset your data to a smaller number of loci?
  • How many samples are in your dataset?
  • Do any of your groups have one sample each?
  • Are there more levels in hcpc.pca1.clusters.factor than are actually represented by the data?

zkamvar avatar Dec 15 '17 20:12 zkamvar

To comment on this: there should be k-1 axes, without exception.

One potential glitch would be that the factor actually contains one more group than expected - e.g. a 'ghost' group, without actual member, left over as a level:

> a=factor(c("a", "b", "c"))
> a
[1] a b c
Levels: a b c
> a[1:2]
[1] a b
Levels: a b c

thibautjombart avatar Dec 20 '17 17:12 thibautjombart