diamond icon indicating copy to clipboard operation
diamond copied to clipboard

Anyone use diamond output xml file for b2gpipe?

Open ErisonChen opened this issue 8 years ago • 13 comments

Hi sir, I'm in a hurry to solve this problem,thanks! Two xml files are got from blast and diamond separated.

The b2gpipe code is java -cp ./b2g4pipe/*:/b2g4pipe/ext/*: es.blast2go.prog.B2GAnnotPipe -in test.xml -out test -prop b2gPipe.properties -v -annot -annex

With the blast xml file ,the output is:

Annotation of 182 seqs with 823 annots finished. Now searching for orfan IPRs... B2G-Pipe finished

With the diamond xml file ,the output is:

Annotation of 0 seqs with 0 annots finished. Now searching for orfan IPRs... B2G-Pipe finished

I changed the blast software version in diamond xml file header from 'diamond' to 'BLASTX 2.2.28+'

Does anyone meet the same question like me?

The b2g google group question link is here

ErisonChen avatar Oct 10 '16 06:10 ErisonChen

Can you send me those 2 xml files?

bbuchfink avatar Oct 10 '16 07:10 bbuchfink

Hi , test.fasta test_blast.xml test_diamond.xml

1.The blast xml file coms from blastx -query test.fasta -db animal_nr.fa -out test_blast.xml -evalue 1e-5 -max_target_seqs 10 -num_threads 4 -outfmt 5

2.The blast xml file comes from diamond makedb --in animal_nr.fa -d animal diamond blastx -d animal -q test.fasta --sensitive -k 10 -e 1e-5 -o test_diamond.xml -f 5

3.The b2gpipe code is java -cp ./b2g4pipe/*:/b2g4pipe/ext/*: es.blast2go.prog.B2GAnnotPipe -in test_diamond.xml -out test_diamond -prop b2gPipe.properties -v -annot -annex

ErisonChen avatar Oct 10 '16 08:10 ErisonChen

I'll look into it but it's probably going to take until later tomorrow.

bbuchfink avatar Oct 10 '16 16:10 bbuchfink

Hi sir , when comparing the two xml files which are generated from diamond and blast, i find that the <Hit_def> part is different.

The diamond xml file :<Hit_def>gi|298162778|gb|ADI59753.1|</Hit_def> The blast xml file :<Hit_def>gi|723159188|gb|KHC33245.1| bromodomain-containing factor 1 [Candida albicans P76055]</Hit_def>

I thought maybe the [species information] is required when we use b2gpipe with the xml file.

ErisonChen avatar Oct 12 '16 11:10 ErisonChen

Possibly, you can get diamond to include the full title by using the option --salltitles.

bbuchfink avatar Oct 12 '16 16:10 bbuchfink

I ran your files but I always get this error message and it takes very long:

BLAST result added to sequence: Cluster-120617.169989 Problem connecting to database b2g_apr12 on 10.10.100.203 as blast2go with passw ord starts with bla*****: com.mysql.jdbc.Driver Database or network connection (timeout) error for: 10.10.100.203 Database or network connection (timeout) error for: 10.10.100.203 Could not connect to Database at: 10.10.100.203

I deleted everything but Query_2 from both the diamond and the blast file. It will then finish in reasonable time, but I get the message Annotation of 0 seqs with 0 annots finished. for both the blast and the diamond file.

bbuchfink avatar Oct 12 '16 19:10 bbuchfink

It worked with the option --salltitles. Your problem may caused by the b2gPipe.properties file; I altered it by the follow information.

Dbacces.dbname=b2g_update Dbacces.dbhost=10.1.50.254:3310

You can have a try .

THANKS SO MUCH for everything you've done!

ErisonChen avatar Oct 13 '16 02:10 ErisonChen

Same issue!

The differences of blast and diamond xml file are at the Hit_id, Hit_def, and Hit_accession. Tried to change the accesssion like a blast, didn't work. Trying others now. Any idea? Thanks.

Below are details: Blast:

  <Hit_id>gi|1016647142|ref|XP_016043819.1|</Hit_id>
  <Hit_def>PREDICTED: splicing regulatory glutamine/lysine-rich protein 1 isoform X2 [Erinaceus europaeus]</Hit_def>
  <Hit_accession>XP_016043819</Hit_accession>

Diamond:

  <Hit_id>gnl|BL_ORD_ID|4237401</Hit_id>
  <Hit_def>XP_002934235.1 PREDICTED: splicing regulatory glutamine/lysine-rich protein 1 [Xenopus tropicalis] OCA51879.1 hypothetical protein XENTR_v90003507mg [Xenopus tropicalis]</Hit_def>
  <Hit_accession>4237401</Hit_accession>

lijingbu avatar Jul 07 '17 15:07 lijingbu

The Diamond database was built with nr.gz, which has sequence headers looks like

>XP_642131.1 hypothetical protein DDB_G0277827 [Dictyostelium discoideum AX4]P54670.1 RecName: Full=Calfumirin-1; Short=CAF-1BAA06266.1 calfumirin-1 [Dictyostelium discoideum AX2]EAL68086.1 hypothetical protein DDB_G0277827 [Dictyostelium discoideum AX4]

But the sequence extracted from NCBI nr databases using command

blastdbcmd -dblastdbcmd -db nr -entry all -out nr.fa

looks like

>gi|504688258|ref|WP_014875360.1| aminodeoxychorismate synthase, component I [Phaeobacter inhibens] >gi|398654275|gb|AFO88245.1| putative para-aminobenzoate synthase component 1 [Phaeobacter inhibens 2.10]

I am trying to build a Diamond database with extracted sequences from nr databases. Will update soon.

lijingbu avatar Jul 07 '17 16:07 lijingbu

Try to use the latest version of diamond, I did change the format in 0.9.7.

bbuchfink avatar Jul 07 '17 18:07 bbuchfink

Thanks for the heads up. Got an error below: Can not parse BLAST XML because can not find information about the used version!

but I got annotation successfully after changing the line below in the xml from <BlastOutput_version>diamond 0.9.9</BlastOutput_version> to <BlastOutput_version>BLASTP 2.2.29+</BlastOutput_version>

lijingbu avatar Jul 07 '17 20:07 lijingbu

Ok I'll change that in the next version.

bbuchfink avatar Jul 08 '17 12:07 bbuchfink

Awesome! Thanks.

lijingbu avatar Jul 10 '17 00:07 lijingbu