Augustus icon indicating copy to clipboard operation
Augustus copied to clipboard

How about training a new species?

Open Huangyizhong opened this issue 3 years ago • 11 comments

Hi, there! Sorry to disturb you again. I have tried the pig genome for lots of days. but it no very well .Would you please help me ? Thanks so much for your kind help! The attached file is the results that I trained! The codes are listed as follows: ${augustus}/gff2gbSmallDNA.pl training.gff3 ./sus.fa 1000 genes.raw.gb ${augustus}/new_species.pl --species=sus11.1_bad ${augustus}/etraining --species=sus11.1_bad --stopCodonExcludedFromCDS=false genes.raw.gb 2> train.err cat train.err | perl -pe 's/.in sequence (\S+): ./$1/' >badgenes.lst ${augustus}/filterGenes.pl badgenes.lst genes.raw.gb > genes.gb ${augustus}/randomSplit.pl genes.gb 100 ${augustus}/new_species.pl --species=sus11.1-final nohup ${augustus}/etraining --species=sus11.1-final genes.gb.train & nohup ${augustus}/augustus --species=sus11.1-final genes.gb.test | tee firsttest_100.out &

image

Huangyizhong avatar May 21 '21 14:05 Huangyizhong

Human parameters are expected to work well for pig. I would not retrain for pig.

On Fri, May 21, 2021 at 4:44 PM Yizhong Huang @.***> wrote:

Hi, there! Sorry to disturb you again. I have tried the pig genome for lots of days. but it no very well .Would you please help me ? Thanks so much for your kind help! The attached file is the results that I trained! The codes are listed as follows: ${augustus}/gff2gbSmallDNA.pl training.gff3 ./sus.fa 1000 genes.raw.gb ${augustus}/new_species.pl --species=sus11.1_bad ${augustus}/etraining --species=sus11.1_bad --stopCodonExcludedFromCDS=false genes.raw.gb 2> train.err cat train.err | perl -pe 's/.in sequence (\S+): ./$1/' >badgenes.lst ${augustus}/filterGenes.pl badgenes.lst genes.raw.gb > genes.gb ${augustus}/randomSplit.pl genes.gb 100 ${augustus}/new_species.pl --species=sus11.1-final nohup ${augustus}/etraining --species=sus11.1-final genes.gb.train & nohup ${augustus}/augustus --species=sus11.1-final genes.gb.test | tee firsttest_100.out &

[image: image] https://user-images.githubusercontent.com/31943359/119155238-8b92c300-ba85-11eb-806a-ebb5c3366f1f.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/Augustus/issues/298, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JE7VTESVOPGY5KUQ33TOZWUDANCNFSM45JJICUQ .

KatharinaHoff avatar May 21 '21 14:05 KatharinaHoff

Human parameters are expected to work well for pig. I would not retrain for pig. On Fri, May 21, 2021 at 4:44 PM Yizhong Huang @.***> wrote: Hi, there! Sorry to disturb you again. I have tried the pig genome for lots of days. but it no very well .Would you please help me ? Thanks so much for your kind help! The attached file is the results that I trained! The codes are listed as follows: ${augustus}/gff2gbSmallDNA.pl training.gff3 ./sus.fa 1000 genes.raw.gb ${augustus}/new_species.pl --species=sus11.1_bad ${augustus}/etraining --species=sus11.1_bad --stopCodonExcludedFromCDS=false genes.raw.gb 2> train.err cat train.err | perl -pe 's/.in sequence (\S+): ./$1/' >badgenes.lst ${augustus}/filterGenes.pl badgenes.lst genes.raw.gb > genes.gb ${augustus}/randomSplit.pl genes.gb 100 ${augustus}/new_species.pl --species=sus11.1-final nohup ${augustus}/etraining --species=sus11.1-final genes.gb.train & nohup ${augustus}/augustus --species=sus11.1-final genes.gb.test | tee firsttest_100.out & [image: image] https://user-images.githubusercontent.com/31943359/119155238-8b92c300-ba85-11eb-806a-ebb5c3366f1f.png — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#298>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JE7VTESVOPGY5KUQ33TOZWUDANCNFSM45JJICUQ .

Thanks so much for your quick reply. Yes, I have used the human model to run the AUGUSTUS, but the results were not good. Fisrt, the BUSCO about the AUGUSTUS file is about 60%, which were lower than the homology protein. Second, when I do the EVM to combin the result from the Augusts, as shown in the pictures attached. Hoe can I deal with it ? image

Huangyizhong avatar May 21 '21 15:05 Huangyizhong

Have you assessed BUSCO on your own assembly annotation, or on Scrofa10.2/susScr3 assembly with the Augustus ab initio predictions (that were produced with human parameters) that are e.g. available via the UCSC genome browser?

On Fri, May 21, 2021 at 5:31 PM Yizhong Huang @.***> wrote:

Human parameters are expected to work well for pig. I would not retrain for pig. … <#m_-5097965032947899915_m_1898963304607889099_> On Fri, May 21, 2021 at 4:44 PM Yizhong Huang @.***> wrote: Hi, there! Sorry to disturb you again. I have tried the pig genome for lots of days. but it no very well .Would you please help me ? Thanks so much for your kind help! The attached file is the results that I trained! The codes are listed as follows: ${augustus}/gff2gbSmallDNA.pl training.gff3 ./sus.fa 1000 genes.raw.gb ${augustus}/new_species.pl --species=sus11.1_bad ${augustus}/etraining --species=sus11.1_bad --stopCodonExcludedFromCDS=false genes.raw.gb 2> train.err cat train.err | perl -pe 's/.in sequence (\S+): ./$1/' >badgenes.lst ${augustus}/filterGenes.pl badgenes.lst genes.raw.gb > genes.gb ${augustus}/randomSplit.pl genes.gb 100 ${augustus}/new_species.pl --species=sus11.1-final nohup ${augustus}/etraining --species=sus11.1-final genes.gb.train & nohup ${augustus}/augustus --species=sus11.1-final genes.gb.test | tee firsttest_100.out & [image: image] https://user-images.githubusercontent.com/31943359/119155238-8b92c300-ba85-11eb-806a-ebb5c3366f1f.png — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#298 https://github.com/Gaius-Augustus/Augustus/issues/298>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JE7VTESVOPGY5KUQ33TOZWUDANCNFSM45JJICUQ .

Thanks so much for your quick reply. Yes, I have used the human model to run the AUGUSTUS, but the results were not good. Fisrt, the BUSCO about the AUGUSTUS file is about 60%, which were lower than the homology protein. Second, when I do the EVM to combin the result from the Augusts, as shown in the pictures attached. Hoe can I deal with it ? [image: image] https://user-images.githubusercontent.com/31943359/119162078-8c7b2300-ba8c-11eb-8f25-4bbd8811dab8.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/Augustus/issues/298#issuecomment-846036823, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JFBUPIHTYDE4N7JOVTTOZ4GBANCNFSM45JJICUQ .

KatharinaHoff avatar May 21 '21 15:05 KatharinaHoff

Yes, I have done the BUSCO evaluation about my genome, rna-seq data (from the PASA) and homogoly protiens. The busco are 96%, 96%, 90%. But the final result which were combined from the EVM model (rna-data, protein and augustus), it was about 89%. As shown in the picture, the struceure from the Augustus was not consistent with the protein and RNA-data. The codes that I runned the AUGUSTUS were : ${Augustus}/bin/augustus --AUGUSTUS_CONFIG_PATH=${Augustus}/config --strand=both --genemodel=complete --singlestrand=false --protein=on --introns=on --start=on --stop=on - -cds=on --codingseq=on --alternatives-from-evidence=true --gff3=on --UTR=on --outfile=${outputfile}_augustus.gff --species=human ${inputfile} Thanks !

Huangyizhong avatar May 21 '21 15:05 Huangyizhong

I understand your problem. However, I recommend that you BUSCO assess Sus scrofa reference assembly Augustus predictions in order to get an impression of what ab initio Augustus with human parameters predictions can do for your kind of species. This will give better grounds to decide on whether a retraining makes sense. Maybe the problem is not the parameter set.

On Fri, May 21, 2021 at 5:57 PM Yizhong Huang @.***> wrote:

Yes, I have done the BUSCO evaluation about my genome, rna-seq data (from the PASA) and homogoly protiens. The busco are 96%, 96%, 90%. But the final result which were combined from the EVM model (rna-data, protein and augustus), it was about 89%. As shown in the picture, the struceure from the Augustus was not consistent with the protein and RNA-data. The codes that I runned the AUGUSTUS were : ${Augustus}/bin/augustus --AUGUSTUS_CONFIG_PATH=${Augustus}/config --strand=both --genemodel=complete --singlestrand=false --protein=on --introns=on --start=on --stop=on - -cds=on --codingseq=on --alternatives-from-evidence=true --gff3=on --UTR=on --outfile=${outputfile}_augustus.gff --species=human ${inputfile} Thanks !

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/Augustus/issues/298#issuecomment-846057543, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JA4VFBQEXYPOY7XD7DTOZ7GHANCNFSM45JJICUQ .

KatharinaHoff avatar May 21 '21 16:05 KatharinaHoff

Hi, thanks for your advices and I have runned the BUSCO about the Susscrofa reference assembly with the AUGUSTUS codes showed as follows (by chromosomes): ${Augustus}/bin/augustus --AUGUSTUS_CONFIG_PATH=${Augustus}/config --gff3=on --outfile=${outputfile}_augustus.gff --stopCodonExcludedFromCDS=false --species=human ${inputfile} The BUSCO results are shown below: image The BUSCO of my genome results were 65%. Is there some methods to improved it ?

Huangyizhong avatar May 22 '21 07:05 Huangyizhong

Hi, there, Sorry to disturb you! I have conducted the BUSCO of the Susscrofa reference assembly with the AUGUSTUS codes showed as follows (by chromosomes): ${Augustus}/bin/augustus --AUGUSTUS_CONFIG_PATH=${Augustus}/config --gff3=on --outfile=${outputfile}_augustus.gff --stopCodonExcludedFromCDS=false --species=human ${inputfile} The results have attached. Is there some methods to improve it ? How about training the Susscrofa reference genome ? Sincerely Yizhong Huang

Huangyizhong avatar May 26 '21 13:05 Huangyizhong

You can of course try retraining. I personally don't expect that to work too well - but you never know.

Largest improvements in Sus scrofa can probably be achieved via comparative genome annotation. Sus scrofa will be contained in a list of species that we intend to process later this year.

On Wed, May 26, 2021 at 3:20 PM Yizhong Huang @.***> wrote:

Hi, there, Sorry to disturb you! I have conducted the BUSCO of the Susscrofa reference assembly with the AUGUSTUS codes showed as follows (by chromosomes): ${Augustus}/bin/augustus --AUGUSTUS_CONFIG_PATH=${Augustus}/config --gff3=on --outfile=${outputfile}_augustus.gff --stopCodonExcludedFromCDS=false --species=human ${inputfile} The results have attached. Is there some methods to improve it ? How about training the Susscrofa reference genome ? Sincerely Yizhong Huang

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/Augustus/issues/298#issuecomment-848764299, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JC6VPA5F5UN5HU6UJTTPTYSTANCNFSM45JJICUQ .

KatharinaHoff avatar May 26 '21 13:05 KatharinaHoff

We recommend to run BRAKER1 and BRAKER2 separately, not with both inputs at the same time. We have a novel combiner called TSEBRA for merging the BRAKER runs. In principle, you can also merge the GeMoMa set with TSEBRA @Lars @.***> please confirm. For PASA, it depends on whether it's transcripts or CDS features. If you are referring to long ORFs in PASA transcripts (i.e. CDS features), the same holds for PASA.

I cannot give support on how to run EVM because I usually do not run EVM.

On Wed, May 26, 2021 at 4:12 PM Yizhong Huang @.***> wrote:

Thanks so much. I want to do the annotaiotn of the genome, I have obtained the transcripts using the PASA and also the proteins alignment with the GeMoMa, the BUSCO of them were 96%, 90%, respectively. And then I added the AUGUSTUS results (65%) into the EVM model. The final BUSCO was 82%, which made me so confused. Would you please give me some advices? As for the AUGUSTUS results, I have changed the BRAKER to do the gene prediction using the RNA-seq data and proteins. Hopes a good results. Yizhong Huang

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/Augustus/issues/298#issuecomment-848804153, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JGTVLIYUZCAYXROPQDTPT6V7ANCNFSM45JJICUQ .

KatharinaHoff avatar May 26 '21 14:05 KatharinaHoff

Ok, thanks! Yes, I have run the BRAKER for 2 days by using 8 cpus, the time for process of the augustus was long. Whether it is BRAKER1 or BRAKER2, I do not know. Hope everything goes well. Thanks again! Sincerely Yizhong Huang

Huangyizhong avatar May 26 '21 14:05 Huangyizhong

Hi, Katharina is correct, you can combine GeMoMa with BRAKER predicitons with TSEBRA as long as all gene predictions are in GTF format. I haven't tested TSEBRA with GeMoMa and don't know how accurate the results are. However, TSEBRA doesn't take alot of time to run (a few minutes) and it is worth a try if you have the BRAKER1 and BRAKER2 results. If you have any questions about TSEBRA, I'll be happy to help.

I'm not an expert with EVM either, but I've used it recently. Maybe you have to adjust the weights and give GeMoMa and PASA higher weights.

LarsGab avatar May 26 '21 15:05 LarsGab