conterminator
conterminator copied to clipboard
Some configurations of taxons in --kingdoms don't work
Hi,
I am trying to clean several de novo assemblies of insects from common contamination sources: human, bacteria and the database UniVec. For this, I concatenated all the fasta in one file called ' cat.fna' and I made a mapping file at the species level called 'ids_cat'.
I then ran:
conterminator dna cat.fna ids_cat conterminator.results tmp_conterminator --threads 20 --blacklist 10239 --kingdoms '2,28384,9606,50557'
I changed the option kingdoms in order to look for contamination between bacteria, other sequences (which is the taxid I used for the sequences of UniVec), homo sapiens and insects. I do not which to look for contamination between by insect genomes. I do not need to ignore any taxa, so I just specified 10239 in the option blacklist in order to avoid the default taxons (which contain 28384, which I need).
Running this command, I get the following error message rescorediagonal step died
.
Interestingly, it works if I only specify 2,28384,9606
or 2,50557
or even 9606,50557
for kingdoms. Do you have any idea, why the combination I used do not work? 28384,50557
does not work either, but I get a different error message: Extractframes died
Moreover, I do not understand why in the output in which I used 2,50557
, I have contamination between bacteria and human? Shouldn't it not even be looking for contamination at all between these two taxons in this configuration?
Thanks,
Héloïse
N.B. Just to let you know, it seems that conterminator cannot deal with some pattern of fasta identifier. The sequences of UniVec look like gnl|uv|X66730.1:1-2687-49
. I had to change that to gnl uv|X66730.1:1-2687-49
and to write in the mapping file:
gnl 28384
. Otherwise I had the error: crosstaxonfilterorf step died
.