MCScanX icon indicating copy to clipboard operation
MCScanX copied to clipboard

format of input bed file

Open dcopetti opened this issue 6 years ago • 1 comments

hello,

I am preparing the inputs for MCScanX and I find this incongruence: you say you want a .bed file, but the format you give in the readme is different: The xyz.gff file holds gene positions, following a tab-delimited format: "sp&chr_NO gene starting_position ending_position" while in the page you point at for the bed format (http://genome.ucsc.edu/FAQ/FAQformat.html#format1) the coordinates are in col 2 and 3, not 3 and 4.

Also, is it OK to have other columns in the bed file, like

lm1     2617    2650    Lmu01_1T0000010.1:three_prime_utr       .       -       maker   three_prime_UTR .       ID=Lmu01_1T0000010.1:three_prime_utr;Parent=Lmu01_1T0000010.1
lm1     2617    2679    Lmu01_1T0000010.1:exon:6        .       -       maker   exon    .       ID=Lmu01_1T0000010.1:exon:6;Parent=Lmu01_1T0000010.1
lm1     2617    6339    Lmu01_1G0000010 .       -       maker   gene    .       ID=Lmu01_1G0000010;Name=Lmu01_1G0000010;Alias=scaffold1-snap-gene-0.16
lm1     2617    6339    Lmu01_1T0000010.1       .       -       maker   mRNA    .       ID=Lmu01_1T0000010.1;Parent=Lmu01_1G0000010;Name=Lmu01_1T0000010.1;Alias=scaffold1-snap-gene-0.16-mRNA-1;_AED=0.32;_eAED=0.32;_QI=0|0.5|0.4|0.6|0.25|0.4|5|33|124
lm1     2650    2679    Lmu01_1T0000010.1:cds   .       -       maker   CDS     2       ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1     2801    2855    Lmu01_1T0000010.1:cds   .       -       maker   CDS     2       ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1     2801    2855    Lmu01_1T0000010.1:exon:5        .       -       maker   exon    .       ID=Lmu01_1T0000010.1:exon:5;Parent=Lmu01_1T0000010.1
lm1     3657    3694    Lmu01_1T0000010.1:cds   .       -       maker   CDS     0       ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1     3657    3694    Lmu01_1T0000010.1:exon:4        .       -       maker   exon    .       ID=Lmu01_1T0000010.1:exon:4;Parent=Lmu01_1T0000010.1
lm1     3799    4049    Lmu01_1T0000010.1:cds   .       -       maker   CDS     1       ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1     3799    4049    Lmu01_1T0000010.1:exon:3        .       -       maker   exon    .       ID=Lmu01_1T0000010.1:exon:3;Parent=Lmu01_1T0000010.1
lm1     6334    6339    Lmu01_1T0000010.1:cds   .       -       maker   CDS     0       ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1     6334    6339    Lmu01_1T0000010.1:exon:2        .       -       maker   exon    .       ID=Lmu01_1T0000010.1:exon:2;Parent=Lmu01_1T0000010.1

One last thing: it is not clear which is the "reference" and query in the blast and in general: if I have a draft "cabbage" assembly and annotation to aling to arabidopsis, do I keep arabidopsis as the database? (I guess so). and the .bed file, is it from the cabbage annotation? not clear to me, but this is what I guess. Is the draft genome scaffold length taken into account somewhere? E.g. if I have genes just on half of a scaffold, how does it get drawn? Thanks,

Dario

dcopetti avatar Aug 10 '18 14:08 dcopetti

I second these questions. Not clear what query and reference should be

charlesfeigin avatar Sep 01 '21 01:09 charlesfeigin