Benjamin Buchfink
Benjamin Buchfink
https://github.com/bbuchfink/diamond/commit/8ec818ca160fbc26b262ae999c5c11d9c98a7e38 You can use `--oid-output` to write oids instead of accessions into the output file. These are the sequences linearly numbered in the input file starting from 0. You can...
https://github.com/bbuchfink/diamond/wiki/How-to-cluster-huge-datasets @beazerj The latest release has a new feature to run linclust in parallel on multiple nodes. May be interesting for you. Sensitivity should also have substantially improved, and you...
Diamond does not work well by default on very short sequences and needs to manual parameter tuning. I shared some tips here: https://github.com/bbuchfink/diamond/issues/832 https://github.com/bbuchfink/diamond/discussions/469 and in some more issues.
`-c1` is good, you can try a higher block size like `-b6`, if you can assign more memory to a task. 32 threads per task seems reasonable but could be...
-b8 should be slightly faster than -b6 but the gains are probably pretty marginal. The parameter of -g is the number of targets that will be extended for each query,...
Another hint: the best way to speed this up would be to first cluster the database. Diamond now has the feature to do it.
> Is the --approx-id parameters a approximation similar to CDHIT -c parameter ? (identity threshold) Yes. >Has someone already benchmarked the clustering of the NR database ? This is the...
Make sure to have `libsqlite3-dev` installed on the system prior to compiling BLAST.
To include blast db support in the conda version, I would have to depend on the bioconda blast package. This is not available for the Linux and macOS ARM64 architectures...
BLAST database support is now available for the conda version since v2.1.12.