kmer-db icon indicating copy to clipboard operation
kmer-db copied to clipboard

one2all/new2all output counts for common k-mers between multiple db-samples

Open mihkelvaher opened this issue 4 years ago • 3 comments

Hi!

The one2all/new2all give this information about the intersection sizes with a new sample: s1: 100/150 s2: 200/300 s3: 50/1000 ...

Is there any way to get more detailed information showing common k-mers? For example, given these counts, I have no idea if the 50 k-mers seen in s3 are also present in s1 or s2. The preferred output would be something like this: s1: 50/50 s2: 200/300 s3: 0/900 s1 AND s3: 50/100 ...

This can be achieved by creating all of the intersections beforehand, but looking at the kmer-db database structure, I was hoping to skip that step.

Regards, Mihkel

mihkelvaher avatar Jun 28 '20 16:06 mihkelvaher

Dear Mikhel,

We can think of adding the functionality you mentioned to kmer-db. However, the number of all possible intersections grows exponentially with a number of queries. Wouldn't it be better to give user the possibility to explicitly state what intersections he is interested in?

Regards, Adam

agudys avatar Jun 30 '20 07:06 agudys

Hi!

The number of intersections does indeed grow fast. Could the given intersections be limited by the number of k-mers shared by the references? For example, if s1, s2 and s3 share less than 1000 k-mers, the intersection would not be shown. Also, showing intersections where something was actually found while searching, reduces the output size significantly.

mihkelvaher avatar Jul 09 '20 07:07 mihkelvaher

Just run all2all as well :)

blahah avatar Nov 21 '20 01:11 blahah