charcoal icon indicating copy to clipboard operation
charcoal copied to clipboard

confused about some text in output of stage2 `stage2/*postprocess.txt`

Open taylorreiter opened this issue 2 years ago • 0 comments

In text files stage2/*postprocess.txt, there are sections that postprocess mashmap alignments, parsing them to determine percent identity for each contig against each contaminant genome. ex:

removing 9kb with 5kb dirty, contig name NODE_1608_length_9168_cov_9.2029.
   5kb aligns to GCA_900554435.1:USHC01000102.1 at 98.4%
   (d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Phocaeicola;s__Phocaeicola sp900554435)
   ** disagreement at rank 'phylum'; genome p__Firmicutes_A, source p__Bacteroidota

I'm confused because stage2 isn't required for target clean, but this file says that it removes contigs/kb as dirty based on mashmap alignment results. My understanding is that it would be more accurate to state, "identified 9kb with 5kb dirty, contig name NODE_1608_length_9168_cov_9.2029. 5kb aligns to GCA_900554435.1:USHC01000102.1 at 98.4%" OR "verified 9kb contaminant with 5kb dirty, contig name NODE_1608_length_9168_cov_9.2029. 5kb aligns to GCA_900554435.1:USHC01000102.1 at 98.4%"

Am i interpreting this file wrong?

line in code that produces this message: https://github.com/dib-lab/charcoal/blob/latest/charcoal/postprocess_alignments.py#L133

taylorreiter avatar Feb 21 '22 19:02 taylorreiter