Jellyfish icon indicating copy to clipboard operation
Jellyfish copied to clipboard

Is it possible to change step size?

Open mtnouchi opened this issue 5 years ago • 4 comments

Hello

I would like to count equally spaced k-mers. e.g.) in-frame k-mers (such as codons or bi-codons) with step size 3.

Is it possible with Jellyfish?

Thank you

mtnouchi avatar Jan 23 '20 02:01 mtnouchi

This is not supported by default. Can I ask how you would see something like this work? You would pass a step size and, maybe, an offset. For each sequence in a fasta or fastq file, first step forward by the offset, then return the k-mer every step size. Is that what you envision?

gmarcais avatar Feb 05 '20 16:02 gmarcais

Sorry for the late reply.

Yes, that's exactly what I meant. Passing an offset as well as a step size would be great!!

mtnouchi avatar Feb 17 '20 05:02 mtnouchi

It seems interesting. I'll try to implement it.

How would an N be handled? Like it is the start of a new sequence?

gmarcais avatar Feb 17 '20 14:02 gmarcais

Thanks a lot! I'm looking forward to it.

In my case, I don't want k-mers that contain N to be treated as such. Some assembled coding sequences still contain N, Y, R etc. So, I would like N to be just ignored so as not to disrupt the reading frame.

However, in some other cases, like the arbitrary nuber of Ns representing gaps, it would be better to handle them as the start of a new sequence.

Therefore, I would be grateful if the behavior of the program could be specified through a switch.

mtnouchi avatar Feb 18 '20 18:02 mtnouchi