Jellyfish
Jellyfish copied to clipboard
More complicated queries from within python code
Hi all,
I would like to identify all unique kmers of a certain length in a set of input fasta sequences and then report the origin of the kmer as well as the kmer motif.
To do this, I can use the following jellyfish commands manually:
k=9 # for example
jellyfish count -m${k} -s100M -C reference.fasta
jellyfish dump -L 1 -U 1 -c mer_counts.jf
# do some sort of grep to get the headers of the original fasta sequences based on the last output
How can this be translated to python code? It looks like the dump
command can be approximated. But the example you give requires a pre-constructed database file. On the other hand, the Python approximation of the count
command stores things inside a HashCounter, not a Jellyfish database file...
Is it easiest to approach this problem with a bash script (i.e. using grep) or is there a straight-forward way in Python?
Thanks a bunch! ~Lina