sourmash icon indicating copy to clipboard operation
sourmash copied to clipboard

Create script to extract intersections between sets from gather output

Open brooksph opened this issue 8 years ago • 9 comments

Pyupset (python) and UpsetR (R) generate nice plots with interactions between sets but pyupset does not print the unique values/groups or the intersection between these sets. @halexand and I can take this on. Just placing this here as a reminder.

brooksph avatar Sep 29 '17 16:09 brooksph

@taylorreiter what are you using these days?

ctb avatar Apr 07 '20 14:04 ctb

Something like this! Which isn't very pretty, but does the job. As @brooksph stated, upset plots are nice, but they do not print the unique values/groups or the intersection between these sets...so I brute force it in a very not clever way at the bottom of the attached R script. plot_gather_output_var_imp.R.txt

taylorreiter avatar Apr 09 '20 15:04 taylorreiter

(could you post an example image? :)

ctb avatar Apr 09 '20 15:04 ctb

really nice, thanks taylor! here it is, inline -- upset_var_imp_genomes2

ctb avatar Apr 10 '20 13:04 ctb

ref #1234

ctb avatar Feb 04 '21 15:02 ctb

aaaaactually this strikes me as an eminently doable thing for a new sig subcommand - basically something like sourmash sig overlap but for n signatures, not just two.

could output a file format that is trivial to load into 'upset' plotting code per #1234.

ctb avatar Mar 12 '22 16:03 ctb

This is the format that goes into the R libraries

  de novo kaa-mer reference                               
       1       1         0 
       1       0         0 
       1       1         0 
       1       1         0 
       1       0         0 
       1       0         0 

taylorreiter avatar Mar 14 '22 15:03 taylorreiter

adding the upset command via the betterplot plugin in https://github.com/sourmash-bio/sourmash_plugin_betterplot/pull/35 - it produces figures like this:

10sketches upset

ctb avatar Jun 09 '24 22:06 ctb