complex-upset icon indicating copy to clipboard operation
complex-upset copied to clipboard

Memory error when trying to manually sort intersections but not when intersection order is specified with sort_intersections_by

Open lrichter53 opened this issue 2 years ago • 11 comments

I need the intersections sorted in a particular way. This has worked with less intersections, but when I added more I receive a memory error. However when the intersection order isn't specified or is specified with sort_intersections_by the code works.

intersections = list('RaCa2', 'P14K','t1_18','C24S','I4K','S15K', 'A19K', c('C18S','t1_18'),c('C18S','C24S'),c('I4K','A19K'), c('S15K','A19K','T21K'),c('I4K','S15K','t1_18'),c('I4K','A19K','T21K'), c('P3R','I4K','A19K'),c('P3R','S15K','t1_18'), c('P3R','I4K','S15K','t1_18'), c('P3R','I4K','S15K','C18S','t1_18'), c('A19K','t15_23'), c('A19Dab','K22Dab','K23Dab','t15_23','NH2','_CONH2'), c('A19Dap','t15_23','NH2','_CONH2'), c('A19Dap','K22Dap','K23Dap','t15_23','NH2','_CONH2'), c('A19K','d_23','t15_23','NH2','_CONH2'), c('A19K','d_22','t15_23','NH2','_CONH2'), c('A19K','d_19','t15_23','NH2','_CONH2'), c('A19K','d_19','d_22','d_23','t15_23','NH2','_CONH2'), c('A19K','t15_23','NH2','_CONH2'), c('A19K','K22Dap','t15_23','NH3','_CONH2'), c('A19K','K23Dap','t15_23','NH3','_CONH2'), c('A19Dab','t15_23','NH3','_CONH2'), c('A19K','K22Dab','t15_23','NH3','_CONH2'), c('A19K','K23Dab','t15_23','NH3','_CONH2'), c('C18S','A19K','d_19','d_22','d_23','t15_23','NH3','_CONH2'), c('C18S','A19Dap','K22Dap','K23Dap','t15_23','NH3','_CONH2'), c('C18S','A19Dab','K22Dab','K23Dab','t15_23','NH3','_CONH2') )

This produces the error:

Error: cannot allocate vector of size 524287.9 Gb

lrichter53 avatar Apr 25 '22 16:04 lrichter53

Looking at the codebase, yes this is unfortunately the case - when using user-provided intersections we are computing all intersections and subsetting rather than compute the intersections of interest directly:

https://github.com/krassowski/complex-upset/blob/e3b51dce5022972d782b57f0086ac6b7e025c08b/R/data.R#L514-L515

Ideally we would instead create an equivalent to observed_intersections_matrix and just multiply it as in:

https://github.com/krassowski/complex-upset/blob/e3b51dce5022972d782b57f0086ac6b7e025c08b/R/data.R#L534-L537

But if I recall correctly this was not trivial for some reason...

Without changing the logic too much we could try to change the third and fourth argument in all_intersections_matrix(intersect, NULL, 0, Inf) call to min(saply(intersections, length)) and max(saply(intersections, length)), but I am not sure if it would work for all cases right now.

krassowski avatar Apr 25 '22 17:04 krassowski

Hi Michal,

I'm having the same issue as lrichter53 (Error: vector memory exhausted (limit reached?)) and I've tried applying the patch mentioned here: https://stackoverflow.com/questions/72820148/complexupset-how-can-i-plot-selected-intersections. But it seems to have made the issue worse as I receive the memory error a lot quicker. Is there any way to fix the error please? I'm using ComplexUpset 1.3.3 have a 92 long list of intersections I need to apply, so it would be great if there was a fix.

Just adding my commands below (its pretty long):

gene_list = c("unknown", "AmpC1", "BBB_1D_v1", "AAA_1_gene", "AAA_2_genes", "AAA_3_genes", "AAA_4_genes", "BBB_KKK_1_gene", "BBB_KKK_2_genes", "BBB_KKK_3_genes", "BIL_CMY_LAP_SCO_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_2_genes", "UUU_GGG_1_gene", "UUU_GGG_2_genes", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "KKK_GGG_2_genes", "KPC_KKK_MMM_1_gene", "NDM_KKK_MMM_1_gene", "NDM_KKK_MMM_2_genes", "OXA_KKK_MMM_1_gene", "VIM_KKK_MMM_1_gene", "VIM_KKK_MMM_2_genes", "UUU_49_KKK_InhR")


upset(df, gene_list, annotations = list(
  'Infection_status'=(
    ggplot(mapping=aes(fill=Infection_status))
    + ggtitle("alpha") + theme(plot.title = element_text(size = 60, face = "bold"))
    + geom_bar(stat='count', position='fill')
    + scale_y_continuous(labels=scales::percent_format())
    + scale_fill_manual(values=c('Dead'='#ebb860', 'Alive'='#57109e', 'Unknown'='#468a37')) + ylab('Infection_status')),
  'Phenotype'=(
    ggplot(mapping=aes(fill=Phenotype))
    + geom_bar(stat='count', position='fill')
    + scale_y_continuous(labels=scales::percent_format())
    + scale_fill_manual(values=c('grey'='#36e345','intermediate'='#eb5278','sleep'='#1a47db')) + ylab('Phenotype')),
  'Phenotype (mm)'=ggplot(mapping=aes(x=intersection, y=Disk_measurement)) + ggtitle("Phenotype vs cell (DDD) mutations") + geom_hline(yintercept=18, color="pink", size=1, linetype = 'dashed') + annotate("text",x=50, y =17, label = "ECO_O = 18mm",color = "pink",size = 12) + geom_violin(width=1.1, alpha=1.5) + ggbeeswarm::geom_quasirandom(aes(color=DDD_mutations, size = 1)) + guides(color = guide_legend(override.aes = list(size=7))),
  'Phenotypes (mm)'=ggplot(mapping=aes(x=intersection, y=Disk_measurement)) + ggtitle("Phenotype vs Phenotype") + geom_hline(yintercept=18, color="pink", size=1, linetype = 'dashed') + annotate("text",x=50, y =17, label = "ECO_O = 18mm",color = "pink",size = 12) + geom_violin(width=1.1, alpha=1.5) + ggbeeswarm::geom_quasirandom(aes(color=Phenotype, size = 1)) + guides(color = guide_legend(override.aes = list(size=7)))),
  sort_intersections=FALSE, intersections=list(c("AAA_1_gene"), c("AAA_2_genes"), c("AAA_3_genes"), c("AAA_4_genes"), c("UUU_49_KKK_InhR"), c("UUU_GGG_1_gene"), c("UUU_GGG_2_genes"), c("BBB_1D_v1", "AAA_1_gene"),
                                               c("AAA_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene"), c("AAA_1_gene", "BBB_KKK_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene"), c("AAA_1_gene", "KKK_GGG_1_gene"), c("AAA_1_gene", "UUU_GGG_1_gene"),
                                               c("AAA_1_gene", "UUU_GGG_2_genes"), c("BBB_1D_v1", "AAA_1_gene", "KKK_GGG_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_GGG_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene"), c("AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene"),
                                               c("AAA_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "KKK_GGG_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "UUU_GGG_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "KKK_GGG_1_gene"), c("AAA_1_gene", "NDM_KKK_MMM_1_gene"), c("AAA_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "NDM_KKK_MMM_1_gene"), c("AAA_1_gene", "UUU_GGG_1_gene", "OXA_KKK_MMM_1_gene"),
                                               c("AAA_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), c("AAA_1_gene", "UUU_GGG_1_gene", "KPC_KKK_MMM_1_gene"), c("AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), c("AAA_1_gene", "BBB_KKK_1_gene", "UUU_GGG_1_gene", "KPC_KKK_MMM_1_gene"), c("AAA_1_gene", "BBB_KKK_2_genes", "UUU_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), c("AAA_1_gene", "BBB_KKK_2_genes", "UUU_GGG_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"),
                                               c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_2_genes", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "UUU_GGG_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "UUU_GGG_2_genes", "NDM_KKK_MMM_1_gene"), c("AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"),
                                               c("AmpC1", "BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_2_genes", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), c("AAA_1_gene", "UUU_49_KKK_InhR"), c("AAA_2_genes", "BIL_CMY_LAP_SCO_KKK_1_gene"), c("AAA_2_genes", "BBB_KKK_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene"), c("AAA_2_genes", "UUU_GGG_1_gene"), c("AAA_2_genes", "KKK_GGG_1_gene"), c("AAA_2_genes", "NDM_KKK_MMM_1_gene"), c("AAA_2_genes", "BBB_KKK_1_gene", "KPC_KKK_MMM_1_gene"), c("AAA_2_genes", "BIL_CMY_LAP_SCO_KKK_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "KKK_GGG_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_GGG_1_gene", "KKK_GGG_1_gene"),
                                               c("AAA_2_genes", "BBB_KKK_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene"), c("AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene"), c("AAA_2_genes", "BBB_KKK_2_genes", "KKK_GGG_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "KKK_GGG_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_2_genes"), c("AAA_2_genes", "BBB_KKK_1_gene", "BBB_GGG_1_gene", "KKK_GGG_2_genes", "OXA_KKK_MMM_1_gene"), c("AAA_2_genes", "BBB_KKK_2_genes", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), c("AAA_2_genes", "BBB_KKK_2_genes", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"),
                                               c("AAA_2_genes", "BBB_KKK_2_genes", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_2_genes"), c("AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "KPC_KKK_MMM_1_gene"), c("AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_GGG_1_gene", "KPC_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_2_genes", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"),
                                               c("BBB_1D_v1", "AAA_2_genes", "BBB_GGG_1_gene", "KKK_GGG_2_genes", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), c("AAA_2_genes", "BBB_KKK_2_genes", "BIL_CMY_LAP_SCO_KKK_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "UUU_49_KKK_InhR"), c("AAA_3_genes", "UUU_GGG_1_gene"), c("AAA_3_genes", "UUU_49_KKK_InhR"), c("BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene"), c("BBB_1D_v1", "BBB_KKK_1_gene", "UUU_GGG_1_gene"), c("BBB_1D_v1", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene"), c("BBB_KKK_3_genes", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene"), c("BBB_1D_v1", "BBB_KKK_1_gene", "UUU_GGG_1_gene", "KKK_GGG_1_gene"),
                                               c("BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_KKK_1_gene", "UUU_GGG_1_gene", "KPC_KKK_MMM_1_gene"), c("UUU_GGG_1_gene", "VIM_KKK_MMM_1_gene"), c("UUU_GGG_1_gene", "KPC_KKK_MMM_1_gene"), c("BBB_KKK_1_gene", "UUU_GGG_2_genes", "KPC_KKK_MMM_1_gene"), c("BBB_1D_v1", "UUU_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), c("BBB_1D_v1", "UUU_GGG_2_genes", "BBB_GGG_1_gene", "KPC_KKK_MMM_1_gene"), c("BBB_KKK_2_genes", "UUU_GGG_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), c("BBB_KKK_2_genes", "UUU_GGG_1_gene", "BBB_GGG_1_gene", "KPC_KKK_MMM_1_gene"), c("BBB_1D_v1", "BBB_KKK_1_gene", "UUU_GGG_1_gene", "VIM_KKK_MMM_1_gene"), c("BBB_1D_v1", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"),
                                               c("BBB_1D_v1", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_2_genes", "NDM_KKK_MMM_1_gene"), c("BBB_1D_v1", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_2_genes"), c("UUU_GGG_1_gene", "UUU_49_KKK_InhR")),
  queries=list(
    upset_query(intersect=c("unknown"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_3_genes"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_4_genes"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("UUU_49_KKK_InhR"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("UUU_GGG_1_gene"), color='blue', fill='blue', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("UUU_GGG_2_genes"), color='blue', fill='blue', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene"), color='orange', fill='orange', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BBB_KKK_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "KKK_GGG_1_gene"), color='yellow', fill='yellow', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "UUU_GGG_1_gene"), color='yellow', fill='yellow', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "UUU_GGG_2_genes"), color='yellow', fill='yellow', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "KKK_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene"), color='pink', fill='pink', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "KKK_GGG_1_gene"), color='pink', fill='pink', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "UUU_GGG_1_gene"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "KKK_GGG_1_gene"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "NDM_KKK_MMM_1_gene"), color='orange', fill='orange', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "NDM_KKK_MMM_1_gene"), color='blue', fill='blue', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "UUU_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "UUU_GGG_1_gene", "KPC_KKK_MMM_1_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), color='green', fill='green', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BBB_KKK_1_gene", "UUU_GGG_1_gene", "KPC_KKK_MMM_1_gene"), color='green', fill='green', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BBB_KKK_2_genes", "UUU_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), color='green', fill='green', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BBB_KKK_2_genes", "UUU_GGG_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='green', fill='green', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_2_genes", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "UUU_GGG_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "UUU_GGG_2_genes", "NDM_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AmpC1", "BBB_1D_v1", "AAA_1_gene", "BBB_KKK_1_gene", "KKK_GGG_2_genes", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_1_gene", "UUU_49_KKK_InhR"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BIL_CMY_LAP_SCO_KKK_1_gene"), color='yellow', fill='yellow', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_1_gene"), color='yellow', fill='yellow', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene"), color='yellow', fill='yellow', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "UUU_GGG_1_gene"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "KKK_GGG_1_gene"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "NDM_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_1_gene", "KPC_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BIL_CMY_LAP_SCO_KKK_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "KKK_GGG_1_gene"), color='orange', fill='orange', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_GGG_1_gene", "KKK_GGG_1_gene"), color='orange', fill='orange', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene"), color='darkgreen', fill='darkgreen', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene"), color='darkgreen', fill='darkgreen', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_2_genes", "KKK_GGG_1_gene"), color='darkgreen', fill='darkgreen', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "KKK_GGG_1_gene"), color='darkorchid1', fill='darkorchid1', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene"), color='darkorchid1', fill='darkorchid1', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_2_genes"), color='darkorchid1', fill='darkorchid1', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_1_gene", "BBB_GGG_1_gene", "KKK_GGG_2_genes", "OXA_KKK_MMM_1_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_2_genes", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_2_genes", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_2_genes", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_2_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "KPC_KKK_MMM_1_gene"), color='coral', fill='coral', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), color='coral', fill='coral', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_GGG_1_gene", "KPC_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_2_genes", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_GGG_1_gene", "KKK_GGG_2_genes", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "AAA_2_genes", "BBB_KKK_1_gene", "KKK_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='purple', fill='purple', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_2_genes", "BBB_KKK_2_genes", "BIL_CMY_LAP_SCO_KKK_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "UUU_49_KKK_InhR"), color='yellow', fill='yellow', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_3_genes", "UUU_GGG_1_gene"), color='blue', fill='blue', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("AAA_3_genes", "UUU_49_KKK_InhR"), color='green', fill='green', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "BBB_KKK_1_gene", "UUU_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_KKK_3_genes", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "BBB_KKK_1_gene", "UUU_GGG_1_gene", "KKK_GGG_1_gene"), color='KKKck', fill='KKKck', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene", "OXA_KKK_MMM_1_gene"), color='pink', fill='pink', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_KKK_1_gene", "UUU_GGG_1_gene", "KPC_KKK_MMM_1_gene"), color='pink', fill='pink', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("UUU_GGG_1_gene", "VIM_KKK_MMM_1_gene"), color='orange', fill='orange', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("UUU_GGG_1_gene", "KPC_KKK_MMM_1_gene"), color='orange', fill='orange', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_KKK_1_gene", "UUU_GGG_2_genes", "KPC_KKK_MMM_1_gene"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "UUU_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene", "OXA_KKK_MMM_1_gene"), color='pink', fill='pink', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "UUU_GGG_2_genes", "BBB_GGG_1_gene", "KPC_KKK_MMM_1_gene"), color='pink', fill='pink', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_KKK_2_genes", "UUU_GGG_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), color='pink', fill='pink', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_KKK_2_genes", "UUU_GGG_1_gene", "BBB_GGG_1_gene", "KPC_KKK_MMM_1_gene"), color='pink', fill='pink', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "BBB_KKK_1_gene", "UUU_GGG_1_gene", "VIM_KKK_MMM_1_gene"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene", "BBB_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_1_gene"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_2_genes", "NDM_KKK_MMM_1_gene"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("BBB_1D_v1", "BBB_KKK_1_gene", "BIL_CMY_LAP_SCO_KKK_1_gene", "UUU_GGG_1_gene", "KKK_GGG_1_gene", "NDM_KKK_MMM_2_genes"), color='red', fill='red', only_components=c("intersections_matrix", "Intersection size")),
    upset_query(intersect=c("UUU_GGG_1_gene", "UUU_49_KKK_InhR"), color='grey', fill='grey', only_components=c("intersections_matrix", "Intersection size")))) + patchwork::plot_layout(heights=c(0.1, 1.0, 0.3, 0.5)) + labs(title = "Co-occurence of alpha genes", caption = "Data: alpha")

a14578 avatar Nov 06 '22 18:11 a14578

Hi @krassowski , do you know if the memory error when manually sorting intersections will be fixed soon? Would be great to continue using ComplexUpset as its such a great package

a14578 avatar Nov 14 '22 19:11 a14578

But it seems to have made the issue worse as I receive the memory error a lot quicker.

I don't believe that that patch could ever make things worse performance wise. Instead I suspect that the problem you see is not related to this issue but the number of upset_query calls in your example.

Each upset_query creates a new layer which may cause ggplot2 to run out of memory in the plotting phase. This would be consisted with your observation that patch from the faster-specific branch makes the memory error appear sooner - but this is because the performance has improved (you reached the plotting phase sooner). If you just want to give each bar a different fill, you should use aesthetics (as in the 3.2 Fill the bars example), not queries.

Do you still run out of memory if you remove the queries? What is the minimum reproducible example of the problem you are facing (i.e. after removing every single line of code which does not make a difference for the problem at hand)?

do you know if the memory error when manually sorting intersections will be fixed soon?

I don't have bandwidth to work on it this month.

krassowski avatar Nov 14 '22 22:11 krassowski

I don't have bandwidth to work on it this month.

But contributions are welcome if anyone has some time to spare!

krassowski avatar Nov 14 '22 22:11 krassowski

If you just want to give each bar a different fill, you should use aesthetics (as in the 3.2 Fill the bars example), not queries.

This may not be straightforward with the current public API for the provided example code. I guess this is a separate issue:

  • CompelxUpset should combine upset_query layers for the same colour if the are not conflicting
  • CompelxUpset should provide an easy way to fill bars based on intersection/intersection cardinality (in addition to mapping based on individual observations); this is already possible with encode_sets=FALSE + aes() mapping but not documented.

krassowski avatar Nov 14 '22 23:11 krassowski

Hi @krassowski

I've removed the queries but still have the same memory issue:

gene_list = c("SIV_49_Cla_InhR", "TIM_Cla_Darb_2_genes", "TIM_Cla_Darb_1_gene", "LXA_Cla_Darb_1_gene", "VMM_Cla_Darb_2_genes", "VMM_Cla_Darb_1_gene", "KPC_Cla_Darb_1_gene", "JTX_M_Cla_TRMA_2_genes", "JTX_M_Cla_TRMA_1_gene", "CEM_Cla_TRMA_1_gene", "SIV_Cla_TRMA_2_genes", "SIV_Cla_TRMA_1_gene", "JIL_TMY_LAP_TAA_Cla_2_genes", "JIL_TMY_LAP_TAA_Cla_1_gene", "CEM_Cla_3_genes", "CEM_Cla_2_genes", "CEM_Cla_1_gene", "SIV_Cla_Chr_4_genes", "SIV_Cla_Chr_3_genes", "SIV_Cla_Chr_2_genes", "SIV_Cla_Chr_1_gene", "CEM_3D_v1", "TyoMA", "undefined")

upset(df, gene_list, annotations = list( 'Types (mm)'=ggplot(mapping=aes(x=intersection, y=Size)) + ggtitle("Type vs Phenotype") + geom_violin(width=0.8, alpha=1.5) + ggbeeswarm::geom_quasirandom(aes(color=Phenotype, alpha = I(1/2))) + guides(color = guide_legend(override.aes = list(size=5)))), sort_sets=FALSE, sort_intersections=FALSE, intersections=list(c("CEM_3D_v1", "SIV_Cla_Chr_1_gene", "CEM_Cla_1_gene", "SIV_Cla_TRMA_1_gene"), c("CEM_3D_v1", "SIV_Cla_Chr_1_gene", "CEM_Cla_1_gene", "JIL_TMY_LAP_TAA_Cla_1_gene", "JTX_M_Cla_TRMA_1_gene")))

Error: vector memory exhausted (limit reached?)

Any chance you've found a way to solve this memory problem please?

a14578 avatar Dec 29 '22 00:12 a14578

How bug is your data frame? Could you possibly prepare a reproducer using the movies dataset (by duplicating rows as many times as needed and adding as many random group (TRUE/FALSE) columns as needed)? This would help me to look into this locally.

krassowski avatar Dec 29 '22 12:12 krassowski

My personal dataset has 1896 rows with 25 columns, each column representing an intersection. I wasn’t able to replicate this problem using the Movies dataset with the original 58789 rows and 7 genre columns. Even when I increased the number of rows to around 170,000 rows, I still wasn’t able to replicate the problem. I was however able to replicate the problem by changing the Movies dataset, so that it now only has 2000 rows but 26 columns (movie genres or intersections). I’ve attached the data set below, with the minimal Complex-Upset commands required to replicate the problem (listing just two intersections).

Modified Movies dataset:

movies4.csv

Complex-Upset commands:

library(ggplot2) library(ComplexUpset) my_data4 <- read.csv("movies4.csv", header = TRUE) df4_movies <- data.frame(my_data4)

genres4 = colnames(df4_movies)[18:43]

upset(df4_movies, genres4, annotations = list( 'Types (mm)'=ggplot(mapping=aes(x=intersection, y=length)) + ggtitle("Type vs Phenotype") + geom_violin(width=0.8, alpha=1.5)), sort_sets=FALSE, sort_intersections=FALSE, intersections=list(c("Action", "Animation", "Comedy", "Drama", "Long", "Musical", "Silent", "Fantasy", "Opera", "Historical", "Detective", "Emmy_winning", "Animals", "Sci_fi"), c("Action", "Animation", "Comedy", "Drama", "Thriller", "Horror", "Long", "Musical", "Silent", "Western", "Fantasy", "Adventure", "New", "Old", "Opera", "Historical", "Detective", "Science_fiction", "Emmy_winning", "Highly_rated", "Cooking", "Animals", "Sci_fi")))

a14578 avatar Feb 07 '23 18:02 a14578

Hi @krassowski, do you know if there is any update on the memory error please? I'm really looking forward to using ComplexUpset with my data set

a14578 avatar Mar 13 '23 02:03 a14578

I would also appreciate this functionality - thank you.

kaplans1 avatar Jul 10 '23 00:07 kaplans1