sourmash icon indicating copy to clipboard operation
sourmash copied to clipboard

keep strand

Open kullrich opened this issue 1 year ago • 4 comments

Hi, according to the manual at the moment, the reverse complement and in case of is_protein=True all 6 possible open reading frames are considered. Is there an option to keep the strand and not do reverse complement or just keep 3/6 of possible ORFs? Thank you in anticipation Best regards Kristian

kullrich avatar Oct 10 '22 18:10 kullrich

hi @kullrich this is not possible at the moment but would not be hard for us to implement.

we could easily give you a script to produced sketches like this, and the rest of sourmash would work fine with the resulting sketch. interested? (It just will take time to add that feature into sourmash sketch.)

see also https://github.com/sourmash-bio/sourmash/issues/657

ctb avatar Oct 10 '22 18:10 ctb

Yes, I am interested. Actually, I am re-implementing this (https://github.com/mrvollger/StainedGlass) but using sourmash jaccard distances instead of blast . By this, the tool is much faster and can be used without a snakemake environment. Anyhow, I need to keep track of the strand to be able to color possible rearrangements accordingly. Best regards Kristian

kullrich avatar Oct 10 '22 19:10 kullrich

neat! will see what I can do :)

ctb avatar Oct 10 '22 19:10 ctb

Here, e.g. chr4 as a comparison, but with jaccard distances. I just need to work on the color map and to implement strand specific scores. Even multithreading might be possible to calcualte pairwise jaccard distances in parallel (maybe with dask?) CEN4_chr4_3926001_7255300_1000bp

kullrich avatar Oct 10 '22 19:10 kullrich