Martin Steinegger
Martin Steinegger
I added a parameter to control the length `--contamination-len`
We currently predict contamination just for shore sequences of length < 20kb. The 20kb can be in scaffolds or just single sequences. I assume you have just one long sequence?
The `_all` report should contain all the local alignments with cross kingdom hits (--kingdom). This could be used to filter for longer sequences. Can you find the C.elegans and E.coli...
Yes, I agree. I had this on my todo list for quite some time. :( But currently I am quite flooded with work.
Could you please provide your cmake output as well?
The database module should allow you to download the GTDB database. It will build `names.dmp` and `nodes.dmp` based on the GTDB taxonomy.
This should be fixed now. I updated conterminator to the newest version of MMseqs2, which should resolve the issue.
Thank you @pmenzel I will upload the results from the NR to the FTP tomorrow.
Sorry for the delay. I have added the NR files to the ftp `ftp://ftp.ccb.jhu.edu/pub/data/conterminator` There are two files (1) `nr.ids.gz`, which only contains the identfier and (2) `nr.gz`, which shows...
Thank you for catching this! The reported number in the paper is from the kraken report. I have lost some entries while converting the conterminator result to a kraken output...