maf2synteny icon indicating copy to clipboard operation
maf2synteny copied to clipboard

clarification of blocks_coords.txt

Open 0xaf1f opened this issue 3 years ago • 3 comments

My first issue is that blocks_coords.txt is not machine readable, so I've written a script to convert it to BED format for some downstream analyses (it would be great if maf2synteny just output the blocks coordinates in this format directly, by the way). I'm trying to make sure I understand this correctly so that I get the coordinates right.

Block 489
Seq_id  Strand  Start   End     Length
41      -       853742  845462  8280
42      +       3564788 3573068 8280
57      -       844024  835744  8280
71      +       3562990 3571270 8280

Am I correct in that Start is always a zero-based number and End is one-based, even for negative strand entries? BED is that way, but for negative strand entries, the Start position is always the smaller number. So for converting this to BED, I would take the coordinates for the positive strand records as is, but for the minus strand ones, I'd need to add 1 to Start, subtract 1 from End and then switch them? Is that right?

0xaf1f avatar Nov 19 '21 10:11 0xaf1f

The format is similar to Sibelia output, here is more detailed description: https://github.com/bioinf/Sibelia/blob/master/SIBELIA.md#blocks-coordinates. Both coordinates should be 1-based.

mikolmogorov avatar Nov 19 '21 17:11 mikolmogorov

Hmm. I think there's something wrong then. If the coordinates are both 1-based, then the length should be abs(Start - End) + 1. In my example above, the length is the exact difference without having to add 1 ( 844024 - 835744 = 8280). In Sibelia example output, it's the way I'd expect given their description of the format:

1	-	595992	590919	5074

595992 - 590919 + 1 = 5074

0xaf1f avatar Nov 19 '21 17:11 0xaf1f

what looks to actually be the case currently in maf2synteny's blocks_coords is that the smaller of the two numbers (Start for + strand blocks, Stop for - strand blocks) is 0-based and the other is 1-based.

But consider this a feature request for having this file created in bed format :pray:

0xaf1f avatar Nov 19 '21 17:11 0xaf1f