diamond icon indicating copy to clipboard operation
diamond copied to clipboard

Custom output format raises error in subprocess.run

Open johanneswerner opened this issue 5 years ago • 3 comments

Hi,

I am trying to run diamond as part of a luigi workflow and need a custom output (#395). Therefore, I am using subprocess.run() to run diamond which fails due to the --outfmt format, although I am not clear why (running that command from bash works).

subprocess.run(['diamond',
                'blastp',
                '-d', 'refdb.dmnd',
                '-o', 'results.tsv',
                '-f', '6 qseqid sseqid pident length evalue bitscore full_sseq',
                '-q', 'query_file.fasta'],
               check=True)

This raises the following error:

Opening the database...  [0.151s]
Error: Invalid output format. Allowed values: 0,5,6,100,101,102
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/miniconda3/envs/query_genes/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,

I am not sure where this comes from, maybe it has something to do with the quoting that subprocess.run() performs.

Did anyone have similar problems or knows a solution for that? Thank you very much.

johanneswerner avatar Oct 12 '20 12:10 johanneswerner

I would guess subprocess.run will insert additional quotation marks around 6 qseqid sseqid pident length evalue bitscore full_sseq. Not sure how to best work around that. If needed I can modify Diamond accordingly to allow this.

bbuchfink avatar Oct 12 '20 13:10 bbuchfink

Is it possible to return the "invalid output format" to the screen? I have the feeling that something else is happening, because if I am combining parameter and argument (i.e. '-f 6 qseqid sseqid pident length evalue bitscore'), I still get the same error.

I just found out, I can use it like this as an alternative (which would not require any modification):

subprocess.run('diamond blastp -d refdb.dmnd -o results.tsv -f 6 qseqid sseqid pident length evalue bitscore full_sseq -q query_file.fasta', shell=True, check=True)

This also works for me but I am curious where the error with the list calling comes from (but of course, since the second version works, this is also fine for me). Thank you for your answer.

johanneswerner avatar Oct 12 '20 13:10 johanneswerner

I added a message like that: 810b6834a4a807cfa053112cfbd3d6979a0af43c:

I think you probably need to do this: '6', 'qseqid', 'sseqid', 'pident' etc.

bbuchfink avatar Oct 14 '20 13:10 bbuchfink