complex-upset icon indicating copy to clipboard operation
complex-upset copied to clipboard

Easy/programmatic way to color intersection matrix based on degree

Open januz opened this issue 2 years ago • 23 comments

Is your feature request related to a problem? Please describe.

I would like to visualize a large number of intersections. I sorted them using sort_intersections_by=c('degree', 'cardinality'). Now I would like to make intersections indicators and bars more distinct between different degrees so that it's easy to differentiate between intersections with different degree and (hopefully) makes the large graph a bit easier to interpret.

I was able to color the bars based on the degree using the following code:

library(tidyverse)

movies = as.data.frame(ggplot2movies::movies)
movies[movies$mpaa == '', 'mpaa'] = NA
movies = na.omit(movies)
genres = colnames(movies)[18:24]

movies %>% 
  mutate(
    degree = rowSums(
      across(Action:Short)
    ) %% 2 %>% 
      as_factor()
  )%>% 
  upset(
    genres,
    sort_intersections_by=c('degree', 'cardinality'),
    base_annotations=list(
      'Intersection size'=intersection_size(
        mapping=aes(fill=degree)
      ) +
        scale_fill_manual(values=c("grey22", "grey44")) + 
        theme(legend.position = "none")
    )
  )

I don't know how to do so (programmatically) for the intersection matrix. I know that I could manually specify intersections in queries but that would be quite cumbersome.

Describe the solution you'd like

It would be great if there was a programmatic way to color the selections in the intersection matrix.

Describe alternatives you've considered

I only see the alternative to color based on manual queries.

Context (required)

ComplexUpset version: 1.3.0

R version details
$platform
[1] "x86_64-apple-darwin17.0"

$arch
[1] "x86_64"

$os
[1] "darwin17.0"

$system
[1] "x86_64, darwin17.0"

$status
[1] ""

$major
[1] "4"

$minor
[1] "0.4"

$year
[1] "2021"

$month
[1] "02"

$day
[1] "15"

$`svn rev`
[1] "80002"

$language
[1] "R"

$version.string
[1] "R version 4.0.4 (2021-02-15)"

$nickname
[1] "Lost Library Book"```

</details>


<details>
<summary>R session information</summary>

```R
<!-- Please replace this line by output of sessionInfo() -->

januz avatar Aug 07 '21 00:08 januz

Hi @januz sorry I missed your question earlier - busy times.

I know that I could manually specify intersections in queries but that would be quite cumbersome.

I agree, we could make it easier and better. I guess we could auto-generate ..degree.. column (or something) so it is available for use in ggplot mapping. It is not difficult to implement by itself, but I will need to check a few approaches to find out the most performant one.

In the meantime there is a helper function for that in https://github.com/krassowski/complex-upset/issues/60#issuecomment-695975581. I will copy it here in case anyone needs it quick, but it indeed does not replace a more ggplot-friendly solution.

query_by_degree = function(data, groups, params_by_degree, ...) {
    intersections = unique(upset_data(data, groups)$plot_intersections_subset)
    lapply(
        intersections,
        FUN=function(x) {
            members = strsplit(x, '-', fixed=TRUE)[[1]]
            if (!(length(members) %in% names(params_by_degree))) {
                stop(
                    paste('Missing specification of params for degree', length(members))
                )
            }
            args = c(
                list(intersect=members, ...),
                params_by_degree[[length(members)]]
            )
            do.call(upset_query, args)
        }
    )
}

you can use it like:

movies = as.data.frame(ggplot2movies::movies)
movies[movies$mpaa == '', 'mpaa'] = NA
movies = na.omit(movies)
genres = colnames(movies)[18:24]

upset(
    movies,
    genres,
    width_ratio=0.1,
    sort_intersections_by=c('degree', 'cardinality'),
    queries=query_by_degree(
        movies, genres,
        params_by_degree=list(
            '1'=list(color='red', fill='red'),
            '2'=list(color='purple', fill='purple'),
            '3'=list(color='blue', fill='blue'),
            '4'=list(color='green', fill='green')
        ),
        only_components=c("intersections_matrix", "Intersection size")
    )
)

This one creates:

image

krassowski avatar Aug 08 '21 21:08 krassowski

This is great, thank you so much!!

Is there a way to have the bar for no intersections be colored differently than the ones with degree 1? I tried

    params_by_degree=list(
      '0'=list(color='black', fill='black'),
      '1'=list(color='red', fill='red'),
      '2'=list(color='purple', fill='purple'),
      '3'=list(color='blue', fill='blue'),
      '4'=list(color='green', fill='green')
    )

but that didn't work.

Thanks so much for your help and the great package!

januz avatar Aug 08 '21 21:08 januz

Right, sorry I copied the old code without thinking about this case. Here you go:

query_by_degree = function(data, groups, params_by_degree, ...) {
    intersections = unique(upset_data(data, groups)$plot_intersections_subset)
    lapply(
        intersections,
        FUN=function(x) {
            members = ComplexUpset:::get_intersection_members(x)[[1]]
            degree = as.character(ComplexUpset:::calculate_degree(x))
            if (!(degree %in% names(params_by_degree))) {
                stop(
                    paste('Missing specification of params for degree', degree)
                )
            }
            args = c(
                list(intersect=members, ...),
                params_by_degree[[degree]]
            )
            do.call(upset_query, args)
        }
    )
}
upset(
    movies,
    genres,
    width_ratio=0.1,
    sort_intersections_by=c('degree', 'cardinality'),
    queries=query_by_degree(
        movies, genres,
        params_by_degree=list(
            '0'=list(color='orange', fill='orange'),
            '1'=list(color='red', fill='red'),
            '2'=list(color='purple', fill='purple'),
            '3'=list(color='blue', fill='blue'),
            '4'=list(color='green', fill='green')
        ),
        only_components=c("intersections_matrix", "Intersection size")
    )
)

image

krassowski avatar Aug 08 '21 23:08 krassowski

Fantastic. Thank you so much!! Should I close this issue or do you want to keep it open if you want to develop a more general mechanism to be included in the package?

januz avatar Aug 09 '21 14:08 januz

Let's keep it open for now, this seems to be a common usecase and will benefit from an easier way to do this.

krassowski avatar Aug 09 '21 14:08 krassowski

I have one more question: I expected that the following code would fill both the dots and the bars but have them outlined in black. But apparently, this is only true for the bars, the dots all get filled in black:

upset(
  movies,
  genres,
  width_ratio=0.1,
  sort_intersections_by=c('degree', 'cardinality'),
  queries=query_by_degree(
    movies, genres,
    params_by_degree=list(
      '0'=list(color='black', fill='orange'),
      '1'=list(color='black', fill='red'),
      '2'=list(color='black', fill='purple'),
      '3'=list(color='black', fill='blue'),
      '4'=list(color='black', fill='green')
    ),
    only_components=c("intersections_matrix", "Intersection size")
  )
)

januz avatar Aug 09 '21 14:08 januz

This comes down to the shape of the dots; by default ggplot uses 'circle' which uses color to fill the inside; if you switch to "circle filled" you will get what you want:

upset(
    movies,
    genres,
    width_ratio=0.1,
    sort_intersections_by=c('degree', 'cardinality'),
    matrix=intersection_matrix(
        geom=geom_point(shape='circle filled', size=3)
    ),
    queries=query_by_degree(
        movies, genres,
        params_by_degree=list(
            '0'=list(color='black', fill='orange'),
            '1'=list(color='black', fill='red'),
            '2'=list(color='black', fill='purple'),
            '3'=list(color='black', fill='blue'),
            '4'=list(color='black', fill='green')
        ),
        only_components=c("intersections_matrix", "Intersection size")
    )
)

image

krassowski avatar Aug 09 '21 14:08 krassowski

Awesome, thanks again for the prompt help!!

januz avatar Aug 09 '21 15:08 januz

Sorry, I have another problem with this approach. I have a lot of variables/intersections. Consequently, I have to reduce the number of displayed intersections, e.g., by using min_size. If I do this, the above approach fails complaining that queries are not unique:

upset(
  movies,
  genres,
  width_ratio=0.1,
  min_size = 15, # minimum intersection size
  sort_intersections_by=c('degree', 'cardinality'),
  matrix=intersection_matrix(
    geom=geom_point(shape='circle filled', size=3)
  ),
  queries=query_by_degree(
    movies, genres,
    params_by_degree=list(
      '0'=list(color='black', fill='orange'),
      '1'=list(color='black', fill='red'),
      '2'=list(color='black', fill='purple'),
      '3'=list(color='black', fill='blue'),
      '4'=list(color='black', fill='green')
    ),
    only_components=c("intersections_matrix", "Intersection size")
  )
)

januz avatar Aug 09 '21 16:08 januz

This is because the upset_data() call inside of query_by_degree() did not get the min_size argument. A quick workaround would be to use:

query_by_degree = function(data, groups, params_by_degree, shared, ...) {
    intersections = unique(upset_data(data, groups, ...)$plot_intersections_subset)
    lapply(
        intersections,
        FUN=function(x) {
            members = ComplexUpset:::get_intersection_members(x)[[1]]
            degree = as.character(ComplexUpset:::calculate_degree(x))
            if (!(degree %in% names(params_by_degree))) {
                stop(
                    paste('Missing specification of params for degree', degree)
                )
            }
            args = c(
                list(intersect=members),
                shared,
                params_by_degree[[degree]]
            )
            do.call(upset_query, args)
        }
    )
}

upset(
  movies,
  genres,
  width_ratio=0.1,
  min_size = 15, # minimum intersection size
  sort_intersections_by=c('degree', 'cardinality'),
  matrix=intersection_matrix(
    geom=geom_point(shape='circle filled', size=3)
  ),
  queries=query_by_degree(
    movies,
    genres,
    min_size = 15,
    params_by_degree=list(
      '0'=list(fill='orange'),
      '1'=list(fill='red'),
      '2'=list(fill='purple'),
      '3'=list(fill='blue'),
      '4'=list(fill='green')
    ),
    shared=list(
        only_components=c("intersections_matrix", "Intersection size"),
        color='black'
    )
  )
)

image

krassowski avatar Aug 10 '21 16:08 krassowski

That makes sense! Thanks so much for your help.

januz avatar Aug 10 '21 16:08 januz

@krassowski I am using the exact code above for my own data and it works well. However, there does not seem to be a way to sort by degree groups ascending AND sort within degree groups descending. In other words, is there a way to go from left-to-right with bars grouped by degree, e.g. 0, 1, 2, 3, 4. And then, within those groups, from left-to-right, sort the bars descending.

Something like this: Untitled

aschmidt-amplify avatar Jun 29 '22 13:06 aschmidt-amplify

@aschmidt-amplify In other words you would like the parameter sort_intersections to accept a vector of arguments if sort_intersections_by is a vector, do I understand it correctly?

krassowski avatar Jul 01 '22 21:07 krassowski

I tried out the code you gave and I keep getting this error when I plot my graph. Error in upset_data(data, groups) : could not find function "upset_data" Could you help me work around this? This is the code I used

query_by_degree = function(data, groups, params_by_degree, ...) {
    intersections = unique(upset_data(data, groups)$plot_intersections_subset)
    lapply(
        intersections,
        FUN=function(x) {
            members = strsplit(x, '-', fixed=TRUE)[[1]]
            if (!(length(members) %in% names(params_by_degree))) {
                stop(
                    paste('Missing specification of params for degree', length(members))
                )
            }
            args = c(
                list(intersect=members, ...),
                params_by_degree[[length(members)]]
            )
            do.call(upset_query, args)
        }
    )
}

upset(genes, diseases, nset = 9, nintersects = NA, order.by = "degree", decreasing = T, 
queries=query_by_degree(genes, diseases,
        params_by_degree=list(
            '1'=list(color='red', fill='red'),
            '2'=list(color='purple', fill='purple'),
            '3'=list(color='blue', fill='blue'),
            '4'=list(color='green', fill='green'))))

mholtz2 avatar Jul 12 '22 02:07 mholtz2

@mholtz2 did you load library(ComplexUpset)? If you don't want to attach it to the namespace you could just use ComplexUpset::upset_data instead.

krassowski avatar Jul 12 '22 03:07 krassowski

Oops I didnt! Thanks!:)

mholtz2 avatar Jul 12 '22 03:07 mholtz2

how to make it compatible with other queries?

MoLuLuMo avatar Sep 30 '22 19:09 MoLuLuMo

It I recall correctly you can concatenate queries with c() function. Do you have an example?

krassowski avatar Sep 30 '22 20:09 krassowski

I would like generate UpSetplot with edge colored by degree and node colored by set.

I try to use list for query_by_degree and other upset_query. But it doesn't work.

upset( movies, colnames(movies)[3:5], name='genre', width_ratio=0.1, matrix=( intersection_matrix(geom=geom_point(shape='circle filled', size=3)) + scale_color_manual( values=c('Action'='red', 'Adventure'='blue', 'Children'='yellow'), guide=guide_legend(override.aes=list(shape='circle')) ) ), queries=list( upset_query(set='Action', fill='red'), upset_query(set='Adventure', fill='blue'), upset_query(set='Children', fill='yellow') ) )

MoLuLuMo avatar Sep 30 '22 20:09 MoLuLuMo

I would like to an additional question on code for colouring intersection based on degree?

Instead of stipulating the exact colours to be used, e.g.

query_by_degree(
      mm1, disease_names,
      mode=mode,
      matrix=intersection_matrix(
        geom=geom_point(shape='circle filled', size=10)),
      params_by_degree=list(
        '0'=list(color = 'steelblue1', fill='steelblue1'),
        '1'=list(color = 'firebrick1', fill='firebrick1'),
        '2'=list(color = 'violetred3', fill='violetred3'),
        '3'=list( color = 'royalblue2',fill='royalblue2'),
        '4'=list(color = 'firebrick',fill='firebrick'), 
        '5'=list(color = 'darkmagenta', fill='darkmagenta')), shared=list(
          only_components=c("intersections_matrix", "Intersection size", "Set size"),
          color='turquoise')

Is there instead a way to use an existing palette, e.g. scale_fill_brewer()

I have tried the code below but it hasn't worked. I'd be really grateful for thoughts on why this is.

upset(
  mm1, disease_names, width_ratio=0.2, height_ratio = 0.7,
  queries= c(
    query_by_degree(
      mm1, disease_names,
      mode=mode,
      matrix=intersection_matrix(
        geom=geom_point(shape='circle filled', size=10)),
      params_by_degree=list(scale_fill_brewer(palette = 'RdYlGn'))))
)

spencsa avatar Nov 27 '23 14:11 spencsa

this is because scale_fill_brewer does not generate data in supported format. You can retrieve the colours from given palette but you would need to use a different function, and specify how many colours to retrieve, and then reformat the result to format supported by params_by_degree argument.

krassowski avatar Nov 27 '23 14:11 krassowski

Ok - thank you!

spencsa avatar Nov 27 '23 15:11 spencsa

Is there a way to change the colours of the intersection matrix (circles), but not the bars for the intersection size?

spencsa avatar Nov 27 '23 16:11 spencsa