lijingbu
lijingbu
Same issue! The differences of blast and diamond xml file are at the Hit_id, Hit_def, and Hit_accession. Tried to change the accesssion like a blast, didn't work. Trying others now....
The Diamond database was built with nr.gz, which has sequence headers looks like `>XP_642131.1 hypothetical protein DDB_G0277827 [Dictyostelium discoideum AX4]P54670.1 RecName: Full=Calfumirin-1; Short=CAF-1BAA06266.1 calfumirin-1 [Dictyostelium discoideum AX2]EAL68086.1 hypothetical protein DDB_G0277827...
Thanks for the heads up. Got an error below: `Can not parse BLAST XML because can not find information about the used version!` but I got annotation successfully after changing...
Awesome! Thanks.
It might help to detect segmental duplications. Lastz does have a self-alignment and dot plot function.
Same situation here. Human reference genome 3GB, memory consumption ~640GB. Guess it costs memory to gain speed. Still useful in some cases.
Splitting the query sequences is a good idea. However, if I am afraid if the blast database is large, say the size of the NCBI NT database, the loading step...
For a server without permission to do sudo install libraries, a solution is install HDF5 library using conda, 1.8.20 works if other higher version doesn't. `conda install -name smartie-sv hdf5=1.8.20`...