salmon icon indicating copy to clipboard operation
salmon copied to clipboard

Does lack of antisense lncRNAs in the reference affect quantification?

Open pasviber opened this issue 1 year ago • 0 comments

Hi @rob-p ,

First of all, I think Salmon is a great and useful tool. That said, maybe this is a silly question but I would like to know to what extent the presence or absence of lncRNAs, in particular natural antisense lncRNAs (NAT-lncRNAs), affects quantification with Salmon?

For example, imagine I have a transcript file with protein-coding genes (PCGs) and lncRNAs, and my libraries are ISR. I expect that most of the fragments fall with read 1 on the opposite strand and read 2 on the sense strand, being theoretically easy to distinguish a PCG from a NAT-lncRNA that covers a large part of the PCG. But if for example the NAT-lncRNA is not in the reference file, could it happen that library reads belonging to the NAT-lncRNA map against the PCG as ISF fragments or SF orphan reads? Actually, the PCG region that overlaps with the NAT-lncRNA is exactly the reverse complementary of NAT-lncRNA region. I wanted to understand this since I know that salmon defaults to using the discordant and orphan fragments as you mention in this issue https://github.com/COMBINE-lab/salmon/issues/67#issuecomment-238090033. Would the best option be to have the maximum known transcripts in the salmon reference or as decoy sequences, is that true?

Clearly, due to their low expression the lncRNAs are more affected by being quantified alone than the PCGs are by being quantified alone. In Figure 1A of the article https://doi.org/10.1093/gigascience/giz145, we can see this overestimation of the lncRNAs that I suppose will receive fragments belonging to the PCGs due to sequence similarity or what I have mentioned.

Thanks in advance

Pascual

pasviber avatar Dec 18 '24 12:12 pasviber