Clarification on GC Content Increment for G/C Reference Deletions
Hello,
Firstly, I want to express my appreciation for this project! I’m currently learning Rust for bioinformatics, and your codebase is one of the best practical tutorials I’ve encountered. However, I’ve come across something in the code that I don’t fully understand, and I’m hoping for some clarification.
In the following section of the code: https://github.com/google/best/blob/c1c69bb3341834e71c2f109ee2b69b19b9a51190/src/stats.rs#L470-L474
During the processing of deletions, when a G or C is encountered in the reference, the GC content is incremented. Could you explain the reasoning behind this? My understanding is that a deletion would reduce the GC content since there would be fewer Gs or Cs in the read sample.
@xianyu0623 I believe what is happening here is that the deletion is accounted for in the reference sequence, and then the gc_content is accounted for from that.
I'm sorry for the delayed response - I was out for some time last year and had not seen this. Please let me know if you have any further questions.