LongQC icon indicating copy to clipboard operation
LongQC copied to clipboard

Question about bases masked

Open justdx opened this issue 3 years ago • 3 comments

Hi, Yoshinori. First thanks for developing this nice tool. You explained that the second column of 'longqc_sdust.txt' table is 'the number of bases masked (MDUST)'. I am wondering how you define a masked base. Since for my read, I didn't find a base pair in lower case.

justdx avatar Sep 24 '21 15:09 justdx

Hi @justdx,

Thank you for your interest in our tool.

Regarding masking, LongQC computes low complexity region of your reads from scratch using symmetric DUST algorithm. If given reads have low complexity region(s), it is detected by a program regardless of letter case. LongQC is actually a case insensitive tool. I hope this answer clarifies your question.

Yoshinori

yfukasawa avatar Sep 26 '21 07:09 yfukasawa

Hi Yoshinori, Thanks for the reply. I think I have understood it: for a given read, masked base counts the number of bases pairs in low complexity region(s) identified by DUST algorithm. Thanks again for the help.

justdx avatar Sep 27 '21 08:09 justdx

yes, exactly. The tool cannot assume that given reads are already masked by some tools before LongQC (may or may not be masked), hence this configuration. I hope our tool will help your projects.

yfukasawa avatar Sep 27 '21 09:09 yfukasawa