diamond
diamond copied to clipboard
Anyone use diamond output xml file for b2gpipe?
Hi sir, I'm in a hurry to solve this problem,thanks! Two xml files are got from blast and diamond separated.
The b2gpipe code is
java -cp ./b2g4pipe/*:/b2g4pipe/ext/*: es.blast2go.prog.B2GAnnotPipe -in test.xml -out test -prop b2gPipe.properties -v -annot -annex
With the blast xml file ,the output is:
Annotation of 182 seqs with 823 annots finished. Now searching for orfan IPRs... B2G-Pipe finished
With the diamond xml file ,the output is:
Annotation of 0 seqs with 0 annots finished. Now searching for orfan IPRs... B2G-Pipe finished
I changed the blast software version in diamond xml file header from 'diamond' to 'BLASTX 2.2.28+'
Does anyone meet the same question like me?
The b2g google group question link is here
Can you send me those 2 xml files?
Hi , test.fasta test_blast.xml test_diamond.xml
1.The blast xml file coms from
blastx -query test.fasta -db animal_nr.fa -out test_blast.xml -evalue 1e-5 -max_target_seqs 10 -num_threads 4 -outfmt 5
2.The blast xml file comes from
diamond makedb --in animal_nr.fa -d animal
diamond blastx -d animal -q test.fasta --sensitive -k 10 -e 1e-5 -o test_diamond.xml -f 5
3.The b2gpipe code is
java -cp ./b2g4pipe/*:/b2g4pipe/ext/*: es.blast2go.prog.B2GAnnotPipe -in test_diamond.xml -out test_diamond -prop b2gPipe.properties -v -annot -annex
I'll look into it but it's probably going to take until later tomorrow.
Hi sir ,
when comparing the two xml files which are generated from diamond and blast, i find that the <Hit_def>
part is different.
The diamond xml file :<Hit_def>gi|298162778|gb|ADI59753.1|</Hit_def>
The blast xml file :<Hit_def>gi|723159188|gb|KHC33245.1| bromodomain-containing factor 1 [Candida albicans P76055]</Hit_def>
I thought maybe the [species information]
is required when we use b2gpipe with the xml file.
Possibly, you can get diamond to include the full title by using the option --salltitles.
I ran your files but I always get this error message and it takes very long:
BLAST result added to sequence: Cluster-120617.169989 Problem connecting to database b2g_apr12 on 10.10.100.203 as blast2go with passw ord starts with bla*****: com.mysql.jdbc.Driver Database or network connection (timeout) error for: 10.10.100.203 Database or network connection (timeout) error for: 10.10.100.203 Could not connect to Database at: 10.10.100.203
I deleted everything but Query_2 from both the diamond and the blast file. It will then finish in reasonable time, but I get the message Annotation of 0 seqs with 0 annots finished.
for both the blast and the diamond file.
It worked with the option --salltitles
.
Your problem may caused by the b2gPipe.properties
file;
I altered it by the follow information.
Dbacces.dbname=b2g_update Dbacces.dbhost=10.1.50.254:3310
You can have a try .
THANKS SO MUCH for everything you've done!
Same issue!
The differences of blast and diamond xml file are at the Hit_id, Hit_def, and Hit_accession. Tried to change the accesssion like a blast, didn't work. Trying others now. Any idea? Thanks.
Below are details: Blast:
<Hit_id>gi|1016647142|ref|XP_016043819.1|</Hit_id>
<Hit_def>PREDICTED: splicing regulatory glutamine/lysine-rich protein 1 isoform X2 [Erinaceus europaeus]</Hit_def>
<Hit_accession>XP_016043819</Hit_accession>
Diamond:
<Hit_id>gnl|BL_ORD_ID|4237401</Hit_id>
<Hit_def>XP_002934235.1 PREDICTED: splicing regulatory glutamine/lysine-rich protein 1 [Xenopus tropicalis] OCA51879.1 hypothetical protein XENTR_v90003507mg [Xenopus tropicalis]</Hit_def>
<Hit_accession>4237401</Hit_accession>
The Diamond database was built with nr.gz, which has sequence headers looks like
>XP_642131.1 hypothetical protein DDB_G0277827 [Dictyostelium discoideum AX4]P54670.1 RecName: Full=Calfumirin-1; Short=CAF-1BAA06266.1 calfumirin-1 [Dictyostelium discoideum AX2]EAL68086.1 hypothetical protein DDB_G0277827 [Dictyostelium discoideum AX4]
But the sequence extracted from NCBI nr databases using command
blastdbcmd -dblastdbcmd -db nr -entry all -out nr.fa
looks like
>gi|504688258|ref|WP_014875360.1| aminodeoxychorismate synthase, component I [Phaeobacter inhibens] >gi|398654275|gb|AFO88245.1| putative para-aminobenzoate synthase component 1 [Phaeobacter inhibens 2.10]
I am trying to build a Diamond database with extracted sequences from nr databases. Will update soon.
Try to use the latest version of diamond, I did change the format in 0.9.7.
Thanks for the heads up. Got an error below:
Can not parse BLAST XML because can not find information about the used version!
but I got annotation successfully after changing the line below in the xml
from
<BlastOutput_version>diamond 0.9.9</BlastOutput_version>
to
<BlastOutput_version>BLASTP 2.2.29+</BlastOutput_version>
Ok I'll change that in the next version.
Awesome! Thanks.