diamond icon indicating copy to clipboard operation
diamond copied to clipboard

About out.xml too big

Open sher-l opened this issue 4 years ago • 4 comments

CMD:/cluster/apps/diamond/diamond blastx -q unigene.fasta -d /database/blastdb/Nr/nr -f 5 -o blast.xml -p 96 --max-target-seqs 1 -e 1e-5 --block-size 50 --long-reads --index-chunks 1

There are 18G blast.xml (diamond blast) blastx just about 4G blast.xml (blastx) How could i make the blast.xml smaller?

sher-l avatar Aug 07 '20 05:08 sher-l

I'd recommend against using the XML format, but if you must you can try to compress the file with gzip, other than that I'm not sure how you would get it smaller.

bbuchfink avatar Aug 09 '20 11:08 bbuchfink

I'd recommend against using the XML format, but if you must you can try to compress the file with gzip, other than that I'm not sure how you would get it smaller.

Maybe My problem description is not accurate,I mean how to make the results less? Just like blastx have -num_alignments,how can I control the diamond blast <Hit_num> number? i try use --max-target-seqs 1,but the <Hit_num> number also have 20 - 30 blastx -num_alignments can control the number, -num_alignments 10 ,the <Hit_num> number 10. Sorry about my pool english

sher-l avatar Aug 10 '20 01:08 sher-l

I see, the --long-reads option is overriding your --max-target-seqs setting here. Don't use this but --range-culling -F15 instead. With --range-culling you will still get multiple hits for a query if they span different ranges, so you can remove that too if you don't want it.

bbuchfink avatar Aug 12 '20 11:08 bbuchfink

I see, the --long-reads option is overriding your --max-target-seqs setting here. Don't use this but --range-culling -F15 instead. With --range-culling you will still get multiple hits for a query if they span different ranges, so you can remove that too if you don't want it.

Oh, I see, thank you so much. And I used '--top' instead '--max-target-seqs', it seem useful.

sher-l avatar Aug 12 '20 11:08 sher-l