seqtk icon indicating copy to clipboard operation
seqtk copied to clipboard

output file contains only one amino acid

Open Jiayi-Zheng opened this issue 2 years ago • 1 comments

Hi, I was using command seqtk subseq uniprot_sprot.fasta name.txt > out.fa my name.txt looks like:

sp|Q9H2Y7|ZN106_HUMAN Zinc finger protein 106 OS=Homo sapiens OX=9606 GN=ZNF106 PE=1 SV=1
sp|Q9Y5V0|ZN706_HUMAN Zinc finger protein 706 OS=Homo sapiens OX=9606 GN=ZNF706 PE=1 SV=1
sp|Q15942|ZYX_HUMAN Zyxin OS=Homo sapiens OX=9606 GN=ZYX PE=1 SV=1

I got a file: `$ head -5 out.fa

sp|P31946|1433B_HUMAN:14-14 14-3-3 protein beta/alpha OS=Homo sapiens OX=9606 GN=YWHAB PE=1 SV=3 L sp|Q04917|1433F_HUMAN:14-14 14-3-3 protein eta OS=Homo sapiens OX=9606 GN=YWHAH PE=1 SV=4 A`

`$ tail -4 out.fa

sp|Q9Y5V0|ZN706_HUMAN Zinc finger protein 706 OS=Homo sapiens OX=9606 GN=ZNF706 PE=1 SV=1 MARGQQKIQSQQKNAKKQAGQKKKQGHDQKAAAKAALIYTCTVCRTQMPDPKTFKQHFESKHPKTPLPPELADVQA sp|Q15942|ZYX_HUMAN Zyxin OS=Homo sapiens OX=9606 GN=ZYX PE=1 SV=1 MAAPRPSPAISVSVSAPAFYAPQKKFGPVVAPKPKVNPFRPGDSEPPPAPGAQRAQMGRVGEIPPPPPEDFPLPPPPLAGDGDDAEGALGGAFPPPPPPIEESFPPAPLEEEIFPSPPPPPEEEGGPEAPIPPPPQPREKVSSIDLEIDSLSSLLDDMTKNDPFKARVSSGYVPPPVATPFSSKSSTKPAAGGTAPLPPWKSPSSSQPLPQVPAPAQSQTQFHVQPQPQPKPQVQLHVQSQTQPVSLANTQPRGPPASSPAPAPKFSPVTPKFTPVASKFSPGAPGGSGSQPNQKLGHPEALSAGTGSPQPPSFTYAQQREKPRVQEKQHPVPPPAQNQNQVRSPGAPGPLTLKEVEELEQLTQQLMQDMEHPQRQNVAVNELCGRCHQPLARAQPAVRALGQLFHIACFTCHQCAQQLQGQQFYSLEGAPYCEGCYTDTLEKCNTCGEPITDRMLRATGKAYHPHCFTCVVCARPLEGTSFIVDQANRPHCVPDYHKQYAPRCSVCSEPIMPEPGRDETVRVVALDKNFHMKCYKCEDCGKPLSIEADDNGCFPLDGHVLCRKCHTARAQT`

So the last parts looks normal but the starting ones are obviously lacking things, not sure why is that?

I tried locating the protein in raw file and it looks fine: `$ grep -A 5 'GN=YWHAB' uniprot_sprot.fasta

sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens OX=9606 GN=YWHAB PE=1 SV=3 MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY LKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFY YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD AGEGEN`

Jiayi-Zheng avatar Jan 31 '23 07:01 Jiayi-Zheng

You can refer to this #180

hanzerruid avatar Feb 21 '24 14:02 hanzerruid