I know what the S means!
Dear Dr. Brislawn, I was curious about your question on the "s" version of Unite's DB and would like to open a small discussion on the subject.
From my understanding, the "NON-S" version only contains singletons (unique sequences which do not cluster with any other sequence in Unite) that have been manually curated by experts (therefore referred to as RefS). This version prioritizes quality over quantity, possibly reducing its capability of detecting rare taxa (less included singletons), but ensuring a very high standard of classification.
The "S" version instead, includes global and 97% singletons, meaning all singleton sequences are included, even those that were not manually curated. This boosts the ability to detect rarer taxa(more included singletons), making the analysis more comprehensive but inevitably noisier due to the possible presence of artifacts or misidentified sequences.
Just wanted to know what your thoughts are on the subject, so maybe I can understand this better too. Thank you for all your precious work on the training for QIIME2, hope to hear from you soon.
Cheers ✌🏻, dD
Thank you! Yes, that is helpful and matches my understanding that "_s" includes more singletons.
First, do you have a source for this? Like, are you secretly one of the Unite Developers? Did you find a PDF that explains this?
Here's my best source: https://unite.ut.ee/repository.php
these sequences are called representative sequences (RepS) when chosen automatically by the computer and reference sequences (RefS) when those choices are overridden (or confirmed) by users with expert knowledge of the taxon at hand
So the _s is RepS automatic curation and NON S is RefS manual curation? Or does it have to do with singleton SHs?
Unfortunately I'm not a Unite Developer (I'd wish to be) and I can't remember the exact source. For sure I read all the information on the Unite website and adventured into the Qiime2-forum to look for other people who had the same question (like this one where I also found your comments). From my understanding the "_s" separates between the two versions with RefS only (only maually curated) and RepS + RefS (manually and automatically curated).
Have you ever tried reaching out to the developers from Unite to ask about this? Maybe I could look up their contact info and ask politely for clarifications on the matter directly from the source?
Yes, I contacted the Unite team a few years ago. They know about this repo and link to it on their website.
I've not received clarification about this yet, so for now I'm being patient. When the right person finds this question, I trust they will answer it.