bioawk icon indicating copy to clipboard operation
bioawk copied to clipboard

bioawk trimming the protein sequences in the start

Open sujanpau opened this issue 5 years ago • 0 comments

Hello,

I am trying to convert the .faa format protein sequences into OrthoMCL readable format (organism_ID|protein_ID) using the bioawk -c fastx '{ print ">GMI1000|"$name; print $seq }'. I am only getting the results with around 900 sequences out of 4000. I found that bioawk is not reading the sequences from first 3000 proteins in the .faa format. Is there any way to solve this problem?

Thank you very much in advance!!

sujanpau avatar Nov 08 '19 19:11 sujanpau