UpSetR icon indicating copy to clipboard operation
UpSetR copied to clipboard

Feature Request: Reporting the intersection results

Open skarunan opened this issue 8 years ago • 30 comments

I couldn't find a way to extract the intersecting values. For instance, reporting the names of movies that fall under more than one genre (action, thriller, drama). At least for me, extracting those names which fall in just one category or more than one specific categories will be a nice feature for the package.

skarunan avatar Jul 12 '17 09:07 skarunan

I would also appreciate this feature!

rotifergirl avatar Jul 24 '17 14:07 rotifergirl

Yes, I'd love this feature, too and would be very keen to get the lists of elements for intersections.

janstrauss1 avatar Jul 26 '17 13:07 janstrauss1

I would also love to have this feature! Or if anyone @rotifergirl @sivarajankarunanithi @janstrauss1 has found a work around please let me know!

wkarno avatar Jul 28 '17 18:07 wkarno

The only work-around I've found is calculate.overlap() from the VennDiagram package, but it's limited in the number of sets it can deal with, so I had to break my sets into groups of three, and then compare the largest overlaps from each initial comparison to find the total overlap.

subset1<-list(set1,set2,set3,set4,set5) overlap1<-calculate.overlap(subset1) subset2<-list(set6,set7,set8,set9,set10) overlap2<-calculate.overlap(subset2) subset3<-list(set11,set12,set13,set14) overlap3<-calculate.overlap(subset3) bigoverlap1<-as.factor(overlap1[[1]]) bigoverlap2<-as.factor(overlap2[[1]]) bigoverlap3<-as.factor(overlap3[[1]]) alloverlaps<-list(bigoverlap1,bigoverlap2,bigoverlap3) finaloverlap<-calculate.overlap(alloverlaps)

This way, finaloverlap[1] is a list of the elements in your largest overlap for all sets, and other groups would have to be worked out another way. It's not elegant, but it did the job for what I needed so it may be helpful!

rotifergirl avatar Jul 28 '17 19:07 rotifergirl

I have found a work-around using a web tool developed by the group of Yves van de Peer at the University of Ghent, Belgium allowing to print lists of elements which are in each intersection or are unique to a certain list. The tool is accessible at http://bioinformatics.psb.ugent.be/webtools/Venn/. I've noted, however, that sometimes there are slight discrepancies between the number of elements predicted by UpSetR and the web tool. Overall, it works well though!

janstrauss1 avatar Jul 31 '17 08:07 janstrauss1

@janstrauss1 Thanks for that solution!

skarunan avatar Aug 02 '17 14:08 skarunan

@ngehlenborg @JakeConway @alexsb I find UpSetR to be very helpful with many intuitive parameters. As suggested on Twitter, I'd like to make a feature request for "Exporting intersection/set data generated behind the scenes by UpSet plots". This data could be used alongside ggplot/other packages to further manipulate and re-plot. Right now assigning the plot to a variable doesn't save anything (NULL). Thanks!

jananiravi avatar Aug 09 '17 20:08 jananiravi

While it would be best if upset() would return a container with the row numbers of the members of each set, in the meantime, I wrote some code using dplyr and tibble to get the members of the intersections from the binary table:

get_intersect_members <- function (x, ...){
  require(dplyr)
  require(tibble)
  x <- x[,sapply(x, is.numeric)][,0<=colMeans(x[,sapply(x, is.numeric)],na.rm=T) & colMeans(x[,sapply(x, is.numeric)],na.rm=T)<=1]
  n <- names(x)
  x %>% rownames_to_column() -> x
  l <- c(...)
  a <- intersect(names(x), l)
  ar <- vector('list',length(n)+1)
  ar[[1]] <- x
  i=2
  for (item in n) {
    if (item %in% a){
      if (class(x[[item]])=='integer'){
        ar[[i]] <- paste(item, '>= 1')
        i <- i + 1
      }
    } else {
      if (class(x[[item]])=='integer'){
        ar[[i]] <- paste(item, '== 0')
        i <- i + 1
      }
    }
  }
  do.call(filter_, ar) %>% column_to_rownames() -> x
  return(x)
}
# get_intersect_members() takes as arguments a dataframe that has been formatted as a binary 
#     table, such as movies from the UpSetR vignette; as well as a series of strings with the names of 
#     columns you wish to test membership for.

# Find all movies that are exclusively dramas:
get_intersect_members <- function (movies, 'Drama')

# Find all movies that are both dramas and comedies:
get_intersect_members <- function (movies, 'Drama', 'Comedy')

# You can also provide the arguments as a vector of column names:
get_intersect_members <- function (movies, c('Drama', 'Comedy'))

For convenience, if you only have your data in the form of a named list (like me) here is a modified version of the fromList function in the package that conserves the item names as row names:

fromList <- function (input) {
  # Same as original fromList()...
  elements <- unique(unlist(input))
  data <- unlist(lapply(input, function(x) {
      x <- as.vector(match(elements, x))
      }))
  data[is.na(data)] <- as.integer(0)
  data[data != 0] <- as.integer(1)
  data <- data.frame(matrix(data, ncol = length(input), byrow = F))
  data <- data[which(rowSums(data) != 0), ]
  names(data) <- names(input)
  # ... Except now it conserves your original value names!
  row.names(data) <- elements
  return(data)
  }

#Example list:
lst <- list(a=c("CARD11_0","EZH2_0","HOXD11_0","FGFR1_0","FGFR1_1"), b=c("EZH2_0","EZH2_0","HOXD11_0","FGFR1_0","FGFR1_0"))

# Binary table with colnames:
fromList(lst)

docmanny avatar Sep 07 '17 19:09 docmanny

Also interested in this feature. @docmanny's solution worked very well for me in the meantime though, thanks a lot!!

bdecato avatar Nov 02 '17 01:11 bdecato

This feature would be very helpful.

cgoo4 avatar Nov 17 '17 21:11 cgoo4

Hi! First post here, hope I make myself clear :)

In order to find the list of intersected values I used "Reduce" and "combn". I reuse a previous post for this (https://stackoverflow.com/questions/24748170/finding-all-possible-combinations-of-vector-intersections). The code:

combos <- Reduce(c,lapply(2:length(v), 
                          function(x) combn(1:length(v),x,simplify=FALSE) ))
intersect <- lapply(combos, function(x) Reduce(intersect,v[x]) )

Where "v" is the list generated file without fromList transformation.

It's important to say that UpSetR only plots the unique values. This means that the values in the intersection between list 2, 3 and 4 would be only the values that appear in this condition and not the intersected values appeared also in list 1, 2, 3 and 4. Cheers,

Tato14 avatar Dec 13 '17 12:12 Tato14

I tried using the great get_intersect_members() function by docmanny (his comment above), to display rows with intersections in a data frame.

Tried first with the movies DF

get_intersect_members(movies, "Drama") # Fx by docmanny = THIS WORKS!

and later with the Data Frame in this great UpsetR Tutorial by James Lloyd: https://www.badgrammargoodsyntax.com/compbio/2018/3/25/compbio-021-upsets-to-replace-complex-venn-diagrams

I did this:

FantasticBeasts_df <- read.table(file = "Fantastic_beasts.txt", header = T, sep = "\t"); upset(fromList(FantasticBeasts_df), order.by = "freq", nsets = 6);

But... the docmanny FX does not work with the FantasticBeasts_df...

get_intersect_members(FantasticBeasts_df, "'Goblin") data frame with 0 columns and 109 rows

get_intersect_members(FantasticBeasts_df, "Goblin","Unicorn") data frame with 0 columns and 109 rows

So, the docmanny Fx works ok for the movies DF, but NOT for the FantasticBeasts_df . :-( Why not? They are both in dataframe format... And both plot OK in the UpsetR pkg...

Am I missing something in the call to the get_intersect_members() Fx?

Any suggestions welcome... SFd99 San Francisco

  • latest Rstudio & R
  • Ubuntu Linux 14.04 LTS

sfd99 avatar Apr 05 '18 17:04 sfd99

I also think this would be very useful.

achamess avatar May 05 '18 16:05 achamess

+1 I'd love it too

brgenzim avatar May 16 '18 12:05 brgenzim

@sfd99 Hi! Sorry I just saw the @ mention, but I do have an answer for you if it still helps. Note that the movies dataset already has a binary membership table inside of it, which is what get_intersect_members() uses to pull out members. As you noted, this doesn't work:

get_intersect_members(FantasticBeasts_df, "'Goblin")
data frame with 0 columns and 109 rows

However, this does:

# You can use either my variant of fromList or the original UpSetR fromList
get_intersect_members(fromList(FantasticBeasts_df), "Unicorn")
# Rownames are from my fromList variant
       Dragon Unicorn House_Elf Goblin Blast.Ended_Skrewt Manticore
g00106      0       1         0      0                  0         0
g00107      0       1         0      0                  0         0
g00108      0       1         0      0                  0         0
g00109      0       1         0      0                  0         0
g00110      0       1         0      0                  0         0
g00111      0       1         0      0                  0         0
g00112      0       1         0      0                  0         0
g00113      0       1         0      0                  0         0
g00114      0       1         0      0                  0         0
g00115      0       1         0      0                  0         0
g00116      0       1         0      0                  0         0
g00117      0       1         0      0                  0         0
g00118      0       1         0      0                  0         0
g00119      0       1         0      0                  0         0
g00120      0       1         0      0                  0         0
g00121      0       1         0      0                  0         0
g00122      0       1         0      0                  0         0
g00123      0       1         0      0                  0         0
g00124      0       1         0      0                  0         0
g00125      0       1         0      0                  0         0
g00126      0       1         0      0                  0         0
g00208      0       1         0      0                  0         0
g00209      0       1         0      0                  0         0
g00210      0       1         0      0                  0         0
g00211      0       1         0      0                  0         0
g00212      0       1         0      0                  0         0
g00213      0       1         0      0                  0         0
g00214      0       1         0      0                  0         0
g00215      0       1         0      0                  0         0
g00216      0       1         0      0                  0         0

You can also see in the UpSetR graph that Goblin and Unicorn only appear together in larger sets, so there are no members that are exclusive to that intersection. As a result, get_intersection_members would not return anything for get_intersection_members((fromList(FantasticBeasts_df), "Goblin", "Unicorn") because it only returns exclusive group members.

Hope that helps!

docmanny avatar Aug 01 '18 22:08 docmanny

Maybe I overlooked it, but having played around with the above solutions, I was still missing a all-in-one function producing a list with all occurring group combination. Therefore I've put something together which might be useful to others (it certainly is for me). I've put the code also in a Gist for further improvments etc.

Note that is best works with a named matrix created by the modified fromList function above which needs to be loaded first.

The below function takes is a bit bulky due to documentation which attempts to show intermediate results for understand whats going on:

overlapGroups <- function (listInput, sort = TRUE) {
  # listInput could look like this:
  # $one
  # [1] "a" "b" "c" "e" "g" "h" "k" "l" "m"
  # $two
  # [1] "a" "b" "d" "e" "j"
  # $three
  # [1] "a" "e" "f" "g" "h" "i" "j" "l" "m"
  listInputmat    <- fromList(listInput) == 1
  #     one   two three
  # a  TRUE  TRUE  TRUE
  # b  TRUE  TRUE FALSE
  #...
  # condensing matrix to unique combinations elements
  listInputunique <- unique(listInputmat)
  grouplist <- list()
  # going through all unique combinations and collect elements for each in a list
  for (i in 1:nrow(listInputunique)) {
    currentRow <- listInputunique[i,]
    myelements <- which(apply(listInputmat,1,function(x) all(x == currentRow)))
    attr(myelements, "groups") <- currentRow
    grouplist[[paste(colnames(listInputunique)[currentRow], collapse = ":")]] <- myelements
    myelements
    # attr(,"groups")
    #   one   two three 
    # FALSE FALSE  TRUE 
    #  f  i 
    # 12 13 
  }
  if (sort) {
    grouplist <- grouplist[order(sapply(grouplist, function(x) length(x)), decreasing = TRUE)]
  }
  attr(grouplist, "elements") <- unique(unlist(listInput))
  return(grouplist)
  # save element list to facilitate access using an index in case rownames are not named
}

How to use it (use case):

library(UpSetR)
# example of list input (list of named vectors)
listInput <- list(one = letters[ c(1, 2, 3, 5, 7, 8, 11, 12, 13) ], 
                  two = letters[ c(1, 2, 4, 5, 10) ], 
                  three = letters[ c(1, 5, 6, 7, 8, 9, 10, 12, 13) ])

### that's pretty much all that's needed..
li <- overlapGroups(listInput)
###

# list of all elements:
 attr(li, "elements")
#  [1] "a" "b" "c" "e" "g" "h" "k" "l" "m" "d" "j" "f" "i"

# which elements are in the biggest group?
 li[1]
# $`one:three`
# g h l m 
# 5 6 8 9 
# attr(,"groups")
#   one   two three 
#  TRUE FALSE  TRUE 

 names(li[[1]])
# [1] "g" "h" "l" "m"
 attr(li, "elements")[li[[1]]]
# [1] "g" "h" "l" "m"

# full list
li
# $`one:three`
# g h l m 
# 5 6 8 9 
# attr(,"groups")
#   one   two three 
#  TRUE FALSE  TRUE 
# 
# $`one:two:three`
# a e 
# 1 4 
# attr(,"groups")
#   one   two three 
#  TRUE  TRUE  TRUE 
# 
##### cut out a bit #####
# $`two:three`
#  j 
# 11 
# attr(,"groups")
#   one   two three 
# FALSE  TRUE  TRUE 
# 
# attr(,"elements")
#  [1] "a" "b" "c" "e" "g" "h" "k" "l" "m" "d" "j" "f" "i"

seb-mueller avatar Aug 23 '18 16:08 seb-mueller

Hey everyone, I was also struggling with this, so I also came up with a solution. I'm sure it could be more elegant, but I think it does the trick (improvements welcome, of course!). In my case, docmanny's solution did not work.

I work with a long list of genes (>800) across several disease types (mutated vs no mutated). The function I provide works with a dataframe as an input, each row a gene and each column a disease (i.e. set). It is not thought for working with more columns than these, so please exclude those that contain extra annotations (for example in the movies.csv, columns like AvgRating should be excluded).

The user can provide sets of interest to be included or excluded (or both) from the intersection of interest.

library(tidyverse)

upset.intersection = function(dataframe, sets_in = NULL, sets_out = NULL){
  
  #This step is used in order to operate with column names. The returning dataframe will have the same colnames as the input.
  data <- dataframe
  names(data)[1] <- "ID"
  
  #Provide names of sets of interest to exclude
  if(!is.null(sets_in) & is.null(sets_out)){
    data <- dplyr::select(data, ID, sets_in)
    data$colsum <- rowSums(data[,2:length(data)])
    intersect_members <- data %>% filter(colsum == length(data)-2)
    names(intersect_members)[1] <- names(dataframe)[1]
    return(intersect_members)
  
  #Provide names of sets to exclude (its equivalent as prividing sets_in without a given set or number of sets)
  }else if(!is.null(sets_out) & is.null(sets_in)){
    data <- dplyr::select(data, -sets_out)
    data$colsum <- rowSums(data[,2:length(data)])
    intersect_members <- data %>% filter(colsum == length(data)-2)
    names(intersect_members)[1] <- names(dataframe)[1]
    return(intersect_members)
  
  #This is to explore those intersections that involve items present in some intersections but are explicitly absent in others
  }else if(!is.null(sets_in) & !is.null(sets_out)){
    data_in <- dplyr::select(data, ID, sets_in)
    data_in$colsum <- rowSums(data_in[,2:length(data_in)])
    intersect_in <- data_in %>% filter(colsum == length(data_in)-2)
    
    data_out <- dplyr::select(data, ID, sets_out)
    data_out$colsum <- rowSums(data_out[,2:length(data_out)])
    intersect_out <- data_out %>% filter(colsum == 0)
    
    intersection <- intersect(intersect_in$ID, intersect_out$ID)
    
    intersect_members <- data %>% filter(ID %in% intersection)
    
    names(intersect_members)[1] <- names(dataframe)[1]
    
    return(intersect_members)
    
  }else{
    data$colsum <- rowSums(data[,2:length(data)])
    intersect_members <- data %>% filter(colsum == length(data)-2)
    names(intersect_members)[1] <- names(dataframe)[1]
    return(intersect_members)
  }
}

Hope it helps,

David

kasadevall avatar Sep 17 '18 16:09 kasadevall

This would be a valuable addition.

I have been using "Vennerable" package to get interactions for any number of groups.

library(Vennerable)
temp=Venn(list("a"=a,"b"=b,"c"=c,"d"=d,"e"=e,"f"=f,"g"=g))  #provide all your groups as list
temp@IntersectionSets$`1111111`  #all overlapping values

"temp" will contain all interaction values. You can retrieve any interaction sets by specifying corresponding overlap.

Hope this is helpful.

ST

sudhirthakurela avatar Oct 24 '18 15:10 sudhirthakurela

@docmanny , your's was a great suggestion! I tried to use your code and on the dataset movies, and it workes wonderfully.

@docmanny On my dataset doesn't work instead, even though I have similar data formatting. I get the following error, which I cannot completely understand.

#These are the data
d <- data.frame(A = sample(c(0,1), size = 20, replace = T), C = sample(c(1,0), size = 20, replace = T), B = sample(c(1,0), size = 20, replace = T))

'data.frame':	20 obs. of  3 variables:
 $ A: num  1 1 0 0 1 1 0 0 1 0 ...
 $ C: num  1 0 0 0 1 1 1 1 0 1 ...
 $ B: num  1 0 1 1 1 1 1 1 1 1 ...

#Then I call get_intersect_members and I get the following error

get_intersect_members(d, "A")
Error: Can't convert a NULL to a quosure
Call `rlang::last_error()` to see a backtrace
Called from: abort(sprintf("Can't convert a %s to a quosure", typeof(lazy)))
Browse[1]> 

@docmanny do you have any idea, where the problem might be? Thanks a lot!

ghost avatar Feb 21 '19 16:02 ghost

@efr3m It was actually a really subtle error born from an assumption on my part. I assumed people would give dataframes with integer values, but 1 and 0 can also be numeric. Because in get_intersect_members I made a check for integer values - and not numeric values - your search query failed with a really unhelpful error.

Use this variant of the function instead:

get_intersect_members <- function (x, ...){
    require(dplyr)
    require(tibble)
    # the following makes sure that we don't have any weird values in the dataframe
    x <- x[,sapply(x, is.numeric)][,0<=colMeans(x[,sapply(x, is.numeric)],na.rm=T) & colMeans(x[,sapply(x, is.numeric)],na.rm=T)<=1]
    n <- names(x)
    #convert rownames to a column to prevent mulching by tidyr
    x %>% rownames_to_column() -> x
    l <- c(...)
    a <- intersect(names(x), l)
    ar <- vector('list',length(n)+1)
    ar[[1]] <- x
    i=2
    for (item in n) {
        if (item %in% a){
            if (class(x[[item]])=='numeric'){   #Now uses numeric instead of integer
                ar[[i]] <- paste(item, '>= 1')
                i <- i + 1
            }
        } else {
            if (class(x[[item]])=='numeric'){
                ar[[i]] <- paste(item, '== 0')
                i <- i + 1
            }
        }
    }
    do.call(filter_, ar) %>% column_to_rownames() -> x
    return(x)
}

You should get:

> get_intersect_members(d,"A")
   A C B
5  1 0 0
10 1 0 0
12 1 0 0
18 1 0 0

docmanny avatar Feb 21 '19 17:02 docmanny

This would be a valuable addition.

I have been using "Vennerable" package to get interactions for any number of groups.

library(Vennerable)
temp=Venn(list("a"=a,"b"=b,"c"=c,"d"=d,"e"=e,"f"=f,"g"=g))  #provide all your groups as list
temp@IntersectionSets$`1111111`  #all overlapping values

"temp" will contain all interaction values. You can retrieve any interaction sets by specifying corresponding overlap.

Hope this is helpful.

ST

Awesome! Very helpful

smoenga55 avatar Mar 01 '20 07:03 smoenga55

Hi! Also struggling with this one. I found some useful hints here! I ended up doing a Tidyverse approach using the filter_at function.

After calling the UpSetR::fromList() I obtain a data frame with 0 and 1 values, similarly to the d data frame generated above.

> set.seed(37)
> d <- data.frame(A = sample(c(0,1), size = 20, replace = T), C = sample(c(1,0), size = 20, replace = T), B = sample(c(1,0), size = 20, replace = T))
> head(d)
  A C B
1 1 1 0
2 0 0 1
3 1 0 0
4 1 1 1
5 0 1 0
6 1 0 0

To get entries overlapping in all:

> d %>% filter_at(vars(c("A", "B", "C")), ~.==1)
  A C B
1 1 1 1
2 1 1 1
3 1 1 1

To get entries only present in A:

> d %>% filter_at(vars(c("A")), ~.==1) %>% filter_at(vars(c("B", "C")), ~.==0)
  A C B
1 1 0 0
2 1 0 0
3 1 0 0

If we want to preserve the row id it could be included in a separate column.

> d %>% rownames_to_column("rowid") %>% filter_at(vars(c("A", "B", "C")), ~.==1)
  rowid A C B
1     4 1 1 1
2    11 1 1 1
3    16 1 1 1

Jakob37 avatar May 14 '20 09:05 Jakob37

@Jakob37 Great.

But recently, tidy syntax must have changed in dplyr 0.8.5 ?. (they now require "where()" ?...).

In your 1st example, I get:

d %>% filter_at(vars(c("A", "B", "C"), ~.==1) + )

Error: Formula shorthand must be wrapped in where().
Bad data %>% select(~. == 1)
Good data %>% select(where(~. == 1)) Run rlang::last_error() to see where the error occurred.

sfd99 avatar May 14 '20 13:05 sfd99

@sfd99 Thank you for your comment! I think I made a typo. It should be an additional end parenthesis i.e. '"C"))' instead of '"C")' .

This one runs for me (also using dplyr 0.8.5):

d %>% filter_at(vars(c("A", "B", "C")), ~.==1)

I corrected it in the example above. Thanks!

Edit: Regarding the where() warning, that is new to me. I do not get that warning after loading dplyr 0.8.5, but it might be that I have missed something there.

Jakob37 avatar May 14 '20 13:05 Jakob37

Works 100% now. Thanks / Tak!

sfd99 avatar May 14 '20 13:05 sfd99

@Jakob37

last quick question, (not an Issue, really)

d %>% rownames_to_column("rowid") %>% filter_at(vars(c("B": "C")), ~.==1) rowid A C B 1 2 1 1 1 2 3 1 1 1 3 16 0 1 1 4 18 1 1 1

good! Now, using your syntax, how to exclude rows where cols: "B" and "C" are exclusively value=1, and any other cols in d (ie: "A") are =zero?.

A result like this: rowid A C B 1 16 0 1 1 # only d row 16 is shown, because it's the row where only "B" and "C" are ==1... all OTHER cols are zero.

sfd99 avatar May 14 '20 14:05 sfd99

Hi! If I understand you correctly you want rows where B and C exclusively are 1. For this I did a separate filtering step.

d %>% rownames_to_column("rowid") %>% filter_at(vars(c("A", "B")), ~.==1) %>% filter_at(vars(c("C")), ~.==0)

Or more general (for any non-A/B column, even if there are more).

target_cols <- c("A", "B")
other_cols <- colnames(d)[!(colnames(d) %in% target_cols)]
d %>% rownames_to_column("rowid") %>% filter_at(vars(all_of(target_cols)), ~.==1) %>% filter_at(vars(all_of(other_cols)), ~.==0)

Hope this helps!

Jakob37 avatar May 14 '20 14:05 Jakob37

Very clear, works perfectly. Thanks Jakob & stay safe!.

sfd99 avatar May 14 '20 16:05 sfd99

Maybe I overlooked it, but having played around with the above solutions, I was still missing a all-in-one function producing a list with all occurring group combination. Therefore I've put something together which might be useful to others (it certainly is for me). I've put the code also in a Gist for further improvments etc.

Note that is best works with a named matrix created by the modified fromList function above which needs to be loaded first.

The below function takes is a bit bulky due to documentation which attempts to show intermediate results for understand whats going on:

overlapGroups <- function (listInput, sort = TRUE) {
  # listInput could look like this:
  # $one
  # [1] "a" "b" "c" "e" "g" "h" "k" "l" "m"
  # $two
  # [1] "a" "b" "d" "e" "j"
  # $three
  # [1] "a" "e" "f" "g" "h" "i" "j" "l" "m"
  listInputmat    <- fromList(listInput) == 1
  #     one   two three
  # a  TRUE  TRUE  TRUE
  # b  TRUE  TRUE FALSE
  #...
  # condensing matrix to unique combinations elements
  listInputunique <- unique(listInputmat)
  grouplist <- list()
  # going through all unique combinations and collect elements for each in a list
  for (i in 1:nrow(listInputunique)) {
    currentRow <- listInputunique[i,]
    myelements <- which(apply(listInputmat,1,function(x) all(x == currentRow)))
    attr(myelements, "groups") <- currentRow
    grouplist[[paste(colnames(listInputunique)[currentRow], collapse = ":")]] <- myelements
    myelements
    # attr(,"groups")
    #   one   two three 
    # FALSE FALSE  TRUE 
    #  f  i 
    # 12 13 
  }
  if (sort) {
    grouplist <- grouplist[order(sapply(grouplist, function(x) length(x)), decreasing = TRUE)]
  }
  attr(grouplist, "elements") <- unique(unlist(listInput))
  return(grouplist)
  # save element list to facilitate access using an index in case rownames are not named
}

How to use it (use case):

library(UpSetR)
# example of list input (list of named vectors)
listInput <- list(one = letters[ c(1, 2, 3, 5, 7, 8, 11, 12, 13) ], 
                  two = letters[ c(1, 2, 4, 5, 10) ], 
                  three = letters[ c(1, 5, 6, 7, 8, 9, 10, 12, 13) ])

### that's pretty much all that's needed..
li <- overlapGroups(listInput)
###

# list of all elements:
 attr(li, "elements")
#  [1] "a" "b" "c" "e" "g" "h" "k" "l" "m" "d" "j" "f" "i"

# which elements are in the biggest group?
 li[1]
# $`one:three`
# g h l m 
# 5 6 8 9 
# attr(,"groups")
#   one   two three 
#  TRUE FALSE  TRUE 

 names(li[[1]])
# [1] "g" "h" "l" "m"
 attr(li, "elements")[li[[1]]]
# [1] "g" "h" "l" "m"

# full list
li
# $`one:three`
# g h l m 
# 5 6 8 9 
# attr(,"groups")
#   one   two three 
#  TRUE FALSE  TRUE 
# 
# $`one:two:three`
# a e 
# 1 4 
# attr(,"groups")
#   one   two three 
#  TRUE  TRUE  TRUE 
# 
##### cut out a bit #####
# $`two:three`
#  j 
# 11 
# attr(,"groups")
#   one   two three 
# FALSE  TRUE  TRUE 
# 
# attr(,"elements")
#  [1] "a" "b" "c" "e" "g" "h" "k" "l" "m" "d" "j" "f" "i"

Further simplify output of overlapGroups

li2 <- purrr::map(li, ~ attr(li, "elements")[.x] )

cparsania avatar Nov 01 '21 00:11 cparsania

Javier Herrero nailed what was asked for brilliantly, check here: https://stackoverflow.com/questions/65027133/extract-intersection-list-from-upset-object

Cheers, Nilay

nilaycan avatar Aug 02 '22 10:08 nilaycan