khmer icon indicating copy to clipboard operation
khmer copied to clipboard

Convert fasta into a Numeric Summarization Vector (NSV)

Open solshiferaw opened this issue 6 years ago • 2 comments

I want to convert fasta file to NSV for k-mers frequency count. What code need to written in python and how to load file? Thank you!

solshiferaw avatar Sep 26 '19 08:09 solshiferaw

Hi Solshi!

Could you please describe with a bit more detail what the contents of the NSV will be and what sequence characteristics they will summarize? An example of what you expect this vector to look like would help as well.

standage avatar Sep 26 '19 13:09 standage

Dear! My NSV file containing frequency of nuclietied, kmers Suppose ... Fasta file contain

RF00050|AECL01000049.1/43972-43822 GGUUGUUCUCAGGGCGGGGUGCAAUUCCCCACCGG RF00050|CP000628.1/2430019-2430165 GACCGUUCUCAGGGCGGGGUGAGAUUCCCCAC conver to kmer frequency count(a, aa, aaa,aaaaa, aaaaaa....) aa, au, cua, cug, ggg, aaac........ and convert those kmer frequency into vector to analyse in Weka. Thank you!

solshiferaw avatar Sep 27 '19 01:09 solshiferaw