Can I use this tool for plants?
Hi,
Thank you for your interesting tool.
I work on plants, so can I use it for plants?
Best, Han
Yes.
Here is my command lines:
for i in *.chr.bam; \
do \
i=${i%.chr.bam*}; \
nohup ROSE_main.py --custom *_refseq.ucsc -i ${i}.cut.bed -r ${i}.chr.bam -o ./${i}/ 2>${i}.log & \
done
Here we only have to insure that:
- The chromosome names in your bam file and peak file are started with 'chr'. If you have contigs, them name them 'chrC1','chrC2', or 'chrContig1','chrContig2', or anything begin with chr as you like.
- Peak files can directly use the narrowPeak files produced by macs2, while remember changing its suffix to .bed.
- Custom your own genome annotation file the same format as UCSC table track format. If you are using a gff3 file as your annotation file, then use a software called gff3ToGenePred to transform it. But Remember to add an index column and a header row to your transformed annotation file, for a normal transformation did not make the format completely the same as the examples provided in the program's annotation folder. A typical gff3 file:
##gff-version 3
chrC1 EVM gene 118111 135837 . + . ID=Contig1G000001;
chrC1 EVM mRNA 118111 135837 . + . ID=Contig1G000001.mRNA1;Parent=Contig1G000001
chrC1 EVM exon 118111 118122 . + . ID=Contig1G000001.exon1;Parent=Contig1G000001.mRNA1
chrC1 EVM CDS 118111 118122 . + 0 ID=Contig1G000001.cds1;Parent=Contig1G000001.mRNA1
chrC1 EVM exon 120459 120548 . + . ID=Contig1G000001.exon2;Parent=Contig1G000001.mRNA1
chrC1 EVM CDS 120459 120548 . + 0 ID=Contig1G000001.cds2;Parent=Contig1G000001.mRNA1
A transformed genepred file using gff3ToGenePred:
Contig1G000005.mRNA1 chrC1 + 185081 186707 185081 186707 2 185081,186044, 185093,186707, 0 Contig1G000005 cmpl cmpl 0,0,
Contig1G000004.mRNA1 chrC1 + 153060 171316 153060 171316 14 153060,153415,153606,153849,155852,156537,156725,160188,161266,161580,164263,166108,166471,171012, 153075,153490,153816,153899,155865,156612,157021,160276,161473,161622,164343,166186,167116,171316, 0 Contig1G000004 cmpl cmpl 0,0,0,0,1,0,0,1,0,0,0,1,1,1,
Contig1G000003.mRNA1 chrC1 + 148519 149466 148519 149466 4 148519,148766,149001,149196, 148607,148885,149076,149466, 0 Contig1G000003 cmpl cmpl 0,2,0,0,
Contig1G000002.mRNA1 chrC1 + 136234 137231 136234 137231 3 136234,136564,136919, 136246,136639,137231, 0 Contig1G000002 cmpl cmpl 0,0,0,
Contig1G000001.mRNA1 chrC1 + 118110 135837 118110 135837 7 118110,120458,121255,128550,128987,129666,135809, 118122,120548,121489,128646,129036,129703,135837, 0 Contig1G000001 cmpl cmpl 0,0,0,0,0,2,1,
An example from the repositories' anntotation folder:
#bin name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds score name2 cdsStartStat cdsEndStat exonFrames
0 NR_075077 chr1 - 67092175 67134971 67134971 67134971 10 67092175,67096251,67103237,67111576,67113613,67115351,67125751,67127165,67131141,67134929, 67093604,67096321,67103382,67111644,67113756,67115464,67125909,67127257,67131227,67134971, 0 C1orf141 unk unk -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
0 NM_001276352 chr1 - 67092175 67134971 67093579 67127240 9 67092175,67096251,67103237,67111576,67115351,67125751,67127165,67131141,67134929, 67093604,67096321,67103382,67111644,67115464,67125909,67127257,67131227,67134971, 0 C1orf141 cmpl cmpl 2,1,0,1,2,0,0,-1,-1,
0 NM_001276351 chr1 - 67092175 67134971 67093004 67127240 8 67092175,67095234,67096251,67115351,67125751,67127165,67131141,67134929, 67093604,67095421,67096321,67115464,67125909,67127257,67131227,67134971, 0 C1orf141 cmpl cmpl 0,2,1,2,0,0,-1,-1,
0 NM_000299 chr1 + 201283451 201332993 201283702 201328836 15 201283451,201293941,201313165,201316552,201317571,201318617,201319815,201320266,201321977,201323012,201324427,201324940,201325753,201328761,201330073, 201283904,201294045,201313560,201316697,201317779,201318795,201319878,201320381,201322133,201323189,201324581,201325127,201325838,201328868,201332993, 0 PKP1 cmpl cmpl 0,1,0,2,0,1,2,2,0,0,0,1,2,0,-1,
So remember adding the bin column and the header line manually A customed annotation file provided to ROSE finally:
#bin name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds score name2 cdsStartStat cdsEndStat exonFrames
0 Contig1G000010.mRNA1 chrC1 + 249212 251406 249212 251406 4 249212,249358,249619,251001, 249227,249503,249801,251406, 0 Contig1G000010 cmpl cmpl 0,0,2,0,
0 Contig1G000009.mRNA1 chrC1 + 222452 247955 222452 247955 19 222452,225441,227650,227864,228265,235538,235788,236674,236869,239211,239470,241009,242202,242448,244280,244530,245983,246205,247562, 222461,225576,227722,228133,228343,235755,235841,236764,236951,239344,239728,241391,242308,242582,244497,244583,246078,246315,247955, 0 Contig1G000009 cmpl cmpl 0,0,0,0,1,1,0,1,1,0,2,2,1,0,1,0,1,2,0,
0 Contig1G000008.mRNA1 chrC1 + 220918 222208 220918 222208 4 220918,221411,221849,222117, 220975,221483,222079,222208, 0 Contig1G000008 cmpl cmpl 0,0,0,1,
0 Contig1G000007.mRNA1 chrC1 + 207537 210311 207537 210311 5 207537,208815,209732,209965,210233, 207558,208941,209804,210193,210311, 0 Contig1G000007 cmpl cmpl 0,0,0,0,0,
0 Contig1G000006.mRNA1 chrC1 + 198072 199140 198072 199140 4 198072,198399,198570,199094, 198084,198471,198896,199140, 0 Contig1G000006 cmpl cmpl 0,0,0,1,
0 Contig1G000005.mRNA1 chrC1 + 185081 186707 185081 186707 2 185081,186044, 185093,186707, 0 Contig1G000005 cmpl cmpl 0,0,
0 Contig1G000004.mRNA1 chrC1 + 153060 171316 153060 171316 14 153060,153415,153606,153849,155852,156537,156725,160188,161266,161580,164263,166108,166471,171012, 153075,153490,153816,153899,155865,156612,157021,160276,161473,161622,164343,166186,167116,171316, 0 Contig1G000004 cmpl cmpl 0,0,0,0,1,0,0,1,0,0,0,1,1,1,
0 Contig1G000003.mRNA1 chrC1 + 148519 149466 148519 149466 4 148519,148766,149001,149196, 148607,148885,149076,149466, 0 Contig1G000003 cmpl cmpl 0,2,0,0,
0 Contig1G000002.mRNA1 chrC1 + 136234 137231 136234 137231 3 136234,136564,136919, 136246,136639,137231, 0 Contig1G000002 cmpl cmpl 0,0,0,
0 Contig1G000001.mRNA1 chrC1 + 118110 135837 118110 135837 7 118110,120458,121255,128550,128987,129666,135809, 118122,120548,121489,128646,129036,129703,135837, 0 Contig1G000001 cmpl cmpl0,0,0,0,0,2,1,
- The annotation file used must be named as *_refseq.ucsc, remember renaming your customed annotation file after changed its format into what you need.
This tool is easy to use, and powerful, I like it too.
Dear Liu,
Thank you so much for your detailed reply.
However, I encountered an error with gff3ToGenePred. The error message is: 'CDS feature must have phase.' My commang is: gff3ToGenePred input.gff3 output.Gp Do you have any suggestions on how to resolve this?
Best, Han
Dear Liu,
Thank you so much for your detailed reply.
However, I encountered an error with gff3ToGenePred. The error message is: 'CDS feature must have phase.' My commang is: gff3ToGenePred input.gff3 output.Gp Do you have any suggestions on how to resolve this?
Best, Han
Maybe you should check whether your gff file has correctly annotated phase of your cds. To do so, check the 8th column of lines that are marked as CDS in 3rd column in your gff file and make sure it appears to be one of the three numbers of 0, 1, or 2 but not any other symbols like '.'. For detailed information and examples about gff format and what does 'phase' mean for CDS, please check https://www.ncbi.nlm.nih.gov/datasets/docs/v1/reference-docs/file-formats/about-ncbi-gff3/