nonpareil icon indicating copy to clipboard operation
nonpareil copied to clipboard

Obtaining diversity estimate

Open snayfach opened this issue 4 years ago • 3 comments

Thanks for the great documentation and software design. Installation and running was super easy.

One question - how to I obtain the nonpareil sequence diversity estimate? I may have missed it, but I couldn't find this info in the documentation. I assume I get this from the 'diversty' slot in the nonpareil object after running the Nonpareil.curve function. Is that correct?

snayfach avatar Aug 28 '19 16:08 snayfach

Hello @snayfach Thanks! I'm glad to hear the docs/interfaces were clear 😃

Yes, you're correct. The diversity estimate is stored in the diversity slot of the Nonpareil.Curve object. You can access it directly with $diversity, or you can see it along with the rest of the estimates using summary(np).

[I'm gonna leave this comment open until I update the documentation to include this, please feel free to add any comments]

lmrodriguezr avatar Aug 28 '19 16:08 lmrodriguezr

Thanks! I'd suggest adding this info to the docs for those who are impatient and using the tool just for this value :)

snayfach avatar Aug 30 '19 16:08 snayfach

Thanks to the Devs for the great package.

I would second the above - a discussion / mention of the diversity metric in the docs would be extremely helpful, as it's not clear if this is even available in the current version (in my case, it is the aspect I am most interested in). As above, the Nd value is available in R via:

library(NonPareil)
samp <- '/path/to/output.npo'
Nonpareil.curve(samp)$diversity

My understanding of the theory is, at best, partial - I presume it is not possible to estimate Nd without estimating coverage at all depths, but In the paper for NP3.0, it implies that estimating coverage is not important for estimating diversity:

"Since the shapes of the Nonpareil curves from replicates and subsamples 
closely resemble each other regardless of coverage (3), we propose Nd as 
a coverage-independent measurement of the diversity of the sampled community."

If so, could the diversity estimate of kmers then be a separate function (R/C++)?

handibles avatar Jul 01 '21 11:07 handibles