MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

Convert Hits from a Search to Fasta files

Open juliacpowell1999 opened this issue 4 years ago • 3 comments

Hi I am new to MMseqs2 (and the coding world in general). What I want to do I am sure is very simple I just cannot seem to figure out how to do it.

I ran a query protein sequence against the UniProtKB/Swiss-Prot database. The resultant ResultDB showed 20 hits. I want to extract those sequences corresponding to the 20 hits to do a MSA using ClustalO. Is there a way to obtain the actual sequences corresponding to the hits from a search query and convert them to fasta format?

I tried using the creattsv function to convert the ResultDB to a tsv file, but because there is no header information the convert2fasta function did not work.

Any help or suggestions would be appreciated.

Thanks!

juliacpowell1999 avatar Sep 28 '21 00:09 juliacpowell1999

@juliacpowell1999 Yes you can get the target sequences by adding tseq to the --format-output options. For example:

easy-search query target result tmp --format-output query,target,tseq

martin-steinegger avatar Sep 28 '21 03:09 martin-steinegger

@juliacpowell1999 Yes you can get the target sequences by adding tseq to the --format-output options. For example:

easy-search query target result tmp --format-output query,target,tseq

Thank you! I now need to convert those sequences in the file to fasta format, how would I do that. I tried to using the convert2fasta command but an error keeps occuring because the file does not contain header information.

juliacpowell1999 avatar Oct 06 '21 07:10 juliacpowell1999

awk '{print ">"$2; print $3}' result > result.fasta

martin-steinegger avatar Oct 07 '21 06:10 martin-steinegger