mafTools icon indicating copy to clipboard operation
mafTools copied to clipboard

mafPairCoverage output bug

Open ens-bwalts opened this issue 9 years ago • 3 comments

I'm seeing some strange output in mafPairCoverage when looking at coverage in regions specified from a .bed file. It looks like In Regions and Out of Regions are reversed (which is causing coverage over 100%):

# BED Regions
#           Sequence      Region Length      In Regions  Out of Regions        Coverage
              A_tha*           59894120        77296918        20069410    1.290559e+00
             A_tha.1           15946509        22110347         5122972    1.386532e+00

ens-bwalts avatar Jun 11 '15 15:06 ens-bwalts

Thanks for the report, this sounds interesting. The code has unit tests — feel free to add your examples of failures and issue a pull request and I'll evaluate. Or send me links to your failure examples so that I can reproduce them.

On Thursday, June 11, 2015, ens-bwalts [email protected] wrote:

I'm seeing some strange output in mafPairCoverage when looking at coverage in regions specified from a .bed file. It looks like In Regions and Out of Regions are reversed (which is causing coverage over 100%):

BED Regions

Sequence Region Length In Regions Out of Regions Coverage

          A_tha*           59894120        77296918        20069410    1.290559e+00
         A_tha.1           15946509        22110347         5122972    1.386532e+00

— Reply to this email directly or view it on GitHub https://github.com/dentearl/mafTools/issues/10.

dentearl avatar Jun 13 '15 23:06 dentearl

It looks like it's summing all aligned positions within the bed region.

If I provide this .maf:

##maf version=1 scoring=N/A

a
s Anc0.Anc0refChr4307          30  33 +     10240 CTCACCTCCTAAAGAAACCAAGTACAAATTAAA
s Anc2.Anc2refChr1726         125  33 -     10260 CTCACCTCCTAAAGGAGCCAAGTACAAATTAAA
s Anc2.Anc2refChr4731       20097  33 -     64532 ATCACCTTCTAAAGAAACCAAGTACAAATTGAA
s A_lyr.2                17885422  33 -  19320864 CTCACCTCCTAAAGGAGCCAAGTACAAATTAAA
s A_lyr.1                 5031915  33 +  33132539 CTCACCTTCTAAAGAAACCAAGTACAAATTGAA
s A_tha.1                 4214339  33 +  30427671 ATCACCTTCTAAAGAAACCAAGTACAAATTGAA
s A_tha.1                23284080  33 +  30427671 CTCACCTCCTAAAGGAGCCAAGTACAAATTGAA

And look for the regions in this .bed:

A_tha.1 4214340 4214343
A_tha.1 5000000 5001000
mafPairCoverage -m test.maf --seq1 A_tha* --seq2 A_lyr* --bed test.bed -v 1

gives this output

# Overall
#           Sequence      Source Length      Obs Length    Aligned Pos.        Coverage    Obs Coverage
              A_tha*           30427671              66             132    4.338157e-06    2.000000e+00
              A_lyr*           52453403              66             132    2.516519e-06    2.000000e+00

# Individual Sequences
#           Sequence      Source Length      Obs Length    Aligned Pos.        Coverage    Obs Coverage
             A_tha.1           30427671              66             132    4.338157e-06    2.000000e+00
             A_lyr.1           33132539              33              66    1.991999e-06    2.000000e+00
             A_lyr.2           19320864              33              66    3.415996e-06    2.000000e+00

# BED Regions
#           Sequence      Region Length      In Regions  Out of Regions        Coverage
              A_tha*               1003               6             126    5.982054e-03
             A_tha.1               1003               6             126    5.982054e-03

Counting 3 bases for the alignment to A_lyr.1 and 3 bases for the alignment to A_lyr.2 as being in regions.

ens-bwalts avatar Jun 19 '15 12:06 ens-bwalts

Thanks for posting an example! Okay, I'll try to eek out some time to investigate this in the next few weeks (my full time job doesn't allow for much hacking around currently).

dentearl avatar Jun 23 '15 16:06 dentearl