sc2rf icon indicating copy to clipboard operation
sc2rf copied to clipboard

ENH: Differentiate between clade defining mutations and optional mutations

Open corneliusroemer opened this issue 2 years ago • 3 comments

If I understand your script correctly, you treat all mutations that are above the user specified threshold identical.

There's room for improvement there.

It would make sense to use two kinds of mutation types for each clade:

  1. Defining mutations that should be present in (almost) all sequences of a clade, so maybe all those mutations present >95%. If these are absent, it means there's a problem either with sequence quality or something else. Absence is very harmful.
  2. Common mutations that sometimes occur, but whose absence does not mean much. Rather, the presence of these mutations increases the probability of a sequence belonging to the clade.

Do you know what I mean? One threshold does not suffice for both concepts.

I'll think a bit more about recombinant detection myself - maybe there are further improvements possible. This is an amazing tool already, though!

corneliusroemer avatar Mar 23 '22 20:03 corneliusroemer