khmer
khmer copied to clipboard
Convert fasta into a Numeric Summarization Vector (NSV)
I want to convert fasta file to NSV for k-mers frequency count. What code need to written in python and how to load file? Thank you!
Hi Solshi!
Could you please describe with a bit more detail what the contents of the NSV will be and what sequence characteristics they will summarize? An example of what you expect this vector to look like would help as well.
Dear! My NSV file containing frequency of nuclietied, kmers Suppose ... Fasta file contain
RF00050|AECL01000049.1/43972-43822 GGUUGUUCUCAGGGCGGGGUGCAAUUCCCCACCGG RF00050|CP000628.1/2430019-2430165 GACCGUUCUCAGGGCGGGGUGAGAUUCCCCAC conver to kmer frequency count(a, aa, aaa,aaaaa, aaaaaa....) aa, au, cua, cug, ggg, aaac........ and convert those kmer frequency into vector to analyse in Weka. Thank you!