MCScanX
MCScanX copied to clipboard
format of input bed file
hello,
I am preparing the inputs for MCScanX and I find this incongruence: you say you want a .bed file, but the format you give in the readme is different: The xyz.gff file holds gene positions, following a tab-delimited format: "sp&chr_NO gene starting_position ending_position" while in the page you point at for the bed format (http://genome.ucsc.edu/FAQ/FAQformat.html#format1) the coordinates are in col 2 and 3, not 3 and 4.
Also, is it OK to have other columns in the bed file, like
lm1 2617 2650 Lmu01_1T0000010.1:three_prime_utr . - maker three_prime_UTR . ID=Lmu01_1T0000010.1:three_prime_utr;Parent=Lmu01_1T0000010.1
lm1 2617 2679 Lmu01_1T0000010.1:exon:6 . - maker exon . ID=Lmu01_1T0000010.1:exon:6;Parent=Lmu01_1T0000010.1
lm1 2617 6339 Lmu01_1G0000010 . - maker gene . ID=Lmu01_1G0000010;Name=Lmu01_1G0000010;Alias=scaffold1-snap-gene-0.16
lm1 2617 6339 Lmu01_1T0000010.1 . - maker mRNA . ID=Lmu01_1T0000010.1;Parent=Lmu01_1G0000010;Name=Lmu01_1T0000010.1;Alias=scaffold1-snap-gene-0.16-mRNA-1;_AED=0.32;_eAED=0.32;_QI=0|0.5|0.4|0.6|0.25|0.4|5|33|124
lm1 2650 2679 Lmu01_1T0000010.1:cds . - maker CDS 2 ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1 2801 2855 Lmu01_1T0000010.1:cds . - maker CDS 2 ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1 2801 2855 Lmu01_1T0000010.1:exon:5 . - maker exon . ID=Lmu01_1T0000010.1:exon:5;Parent=Lmu01_1T0000010.1
lm1 3657 3694 Lmu01_1T0000010.1:cds . - maker CDS 0 ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1 3657 3694 Lmu01_1T0000010.1:exon:4 . - maker exon . ID=Lmu01_1T0000010.1:exon:4;Parent=Lmu01_1T0000010.1
lm1 3799 4049 Lmu01_1T0000010.1:cds . - maker CDS 1 ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1 3799 4049 Lmu01_1T0000010.1:exon:3 . - maker exon . ID=Lmu01_1T0000010.1:exon:3;Parent=Lmu01_1T0000010.1
lm1 6334 6339 Lmu01_1T0000010.1:cds . - maker CDS 0 ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1 6334 6339 Lmu01_1T0000010.1:exon:2 . - maker exon . ID=Lmu01_1T0000010.1:exon:2;Parent=Lmu01_1T0000010.1
One last thing: it is not clear which is the "reference" and query in the blast and in general: if I have a draft "cabbage" assembly and annotation to aling to arabidopsis, do I keep arabidopsis as the database? (I guess so). and the .bed file, is it from the cabbage annotation? not clear to me, but this is what I guess. Is the draft genome scaffold length taken into account somewhere? E.g. if I have genes just on half of a scaffold, how does it get drawn? Thanks,
Dario
I second these questions. Not clear what query and reference should be