pplacer feature suggestion: weights for reference sequences

feature suggestion: weights for reference sequences

Open nhoffman opened this issue 9 years ago • 0 comments

This is sort of a half-baked thought at this point, but it has occurred to me that much information is lost when selecting representative reference sequences to include in a reference package: consider the case when the observed biological diversity for a species consists of many identical or very closely related reference sequences, and a small number of more divergent sequences. It is likely that in this case we would select only one representative of the most prevalent variant to include in the reference package - and in this case pplacer has no way to know which of the reference sequences are more "authoritative" when performing classification. I wonder if there would be some way to represent the prevalence of individual reference sequences among all candidate reference sequences in the form of a weight, and whether the taxonomic assignment could be informed by these weights. Whether it would matter is of course another question... I could imagine that it might help mitigate classification artifacts caused by including "outlier" reference sequences in the reference package.

Aug 24 '15 20:08 nhoffman

pplacer pplacer copied to clipboard

feature suggestion: weights for reference sequences

pplacer
pplacer copied to clipboard