Benjamin Buchfink

Results 445 comments of Benjamin Buchfink

It should work using `--id 100` because that is applied to the unrounded number. If it does not, can you send me a test case? Also, you can use `-f...

The --header option has been included in the latest release. It prints a description of the columns and also the diamond version and invocation. Feel free to check it before...

Ok, I see your point about the header format. Changing things that break compatibility with older versions is also problematic, so I will probably add something like `--header 2` to...

The option does work for me. Check your version using `diamond version` and upgrade if necessary.

To compute the query blocks, take the number of DNA letters in the input file * 2, divided by the block size (2000000000 in your case).

Yes seems correct. The easiest way to reduce runtime would to be used a smaller database if that works for you, e.g. the UniRef50 or annottree, see here: https://journals.asm.org/doi/full/10.1128/msystems.01408-21 To...

Ok thanks, I'll see what I can do.

Hi Markus, as you have noted correctly, Diamond is optimized to be used with large query files. If you use 1,000,000 proteins as input, you will surely get a big...

Hi Oliver, Diamond was not designed for this use case of >90% identity hits only, so I'm pretty sure that substantial speedups would be possible there. Simply building a faster...

Yes that's probably a good idea.