complex-upset icon indicating copy to clipboard operation
complex-upset copied to clipboard

Double/dodged bars

Open outlace opened this issue 4 years ago • 9 comments

Great package!

I'm wondering if there is an easy way to have two side-by-side bars for each intersection? I have my observed data that I want to mainly plot, but by each intersection I want the expected (based on my null model) intersection size so that I can clearly see how different the observed is from the expected. Sometimes expected is > observed and sometimes < observed so the stacked bars isn't clear for that.

outlace avatar May 05 '20 15:05 outlace

Thank you! Sure, I would approach it like this:

upset(
    observed, sets,
    base_annotations=list(
        'Intersection size'=upset_annotate(
            '..count..',
            list(
                geom_bar(
                    aes(fill='observed'),
                    width=0.35,
                    position=position_nudge(x=-0.2)
                ),
                geom_bar(
                    data=expected,
                    aes(fill='expected'),
                    width=0.35,
                    position=position_nudge(x=+0.2)
                )
            )
        )
    )
)

As an exmple:

image

observed = data.frame(
    a=c(TRUE, FALSE, FALSE, TRUE, TRUE),
    b=c(TRUE, TRUE, TRUE, FALSE, TRUE),
    c=c(FALSE, TRUE, TRUE, FALSE, FALSE),
    d=c(FALSE, FALSE, TRUE, FALSE, FALSE)
)
sets = c('a', 'b', 'c', 'd')

then if you have expected counts as a list, create a data frame (each intersection is represented as hyphenated character/string):

expected_summary = data.frame(
    exepcted_count=c(
        'a-b'=3,
        'b-c-d'=1,
        'b-c'=1,
        'a'=0
    )
)
expected_summary$intersection = rownames(expected_summary)
expected_summary

Screenshot from 2020-05-05 16-59-10

and transform to counts format:

expected = expected_summary[
     rep(seq_len(nrow(expected_summary)), expected_summary$exepcted_count),
    1:2
]
expected

Screenshot from 2020-05-05 16-59-14

You can get the intersections data using upset_data() on your data.

Would that work for you?

krassowski avatar May 05 '20 16:05 krassowski

Yes I think that will work. Thanks so much for the quick reply.

outlace avatar May 05 '20 19:05 outlace

I got it working with my data, thanks again! One last thing. I'm having a hard time adding a geom_errorbar to the expected bars. When I add it gets added to the set intersection dot part on the bottom.

outlace avatar May 06 '20 02:05 outlace

This is due to the order the plots are assembled - the "dots matrix" is the last one added. With multi-pane plot I had to create a bit different interface (inspired by ComplexHeatmap among others) which requires to specify where the geom should be added. For geom bars one could do:

intersections_data = upset_data(observed, sets)
intersection_sizes = unique(
    intersections_data$with_sizes[, c('intersection', 'intersection_size')]
)
intersection_sizes

Screenshot from 2020-05-06 04-35-18

confidence_intervals = binom::binom.confint(
    intersection_sizes$intersection_size,
    nrow(observed),
    method='wilson'
)
confidence_intervals$intersection = intersection_sizes$intersection
confidence_intervals

Screenshot from 2020-05-06 04-35-24

upset(
    observed, sets,
    base_annotations=list(
        'Intersection size'=upset_annotate(
            '..count..',
            list(
                geom_bar(
                    aes(fill='observed'),
                    width=0.35,
                    position=position_nudge(x=-0.2)
                ),
                geom_bar(
                    data=expected,
                    aes(fill='expected'),
                    width=0.35,
                    position=position_nudge(x=+0.2)
                ),
                geom_errorbar(
                    data=confidence_intervals,
                    aes(
                        ymin=lower * nrow(observed),
                        ymax=upper * nrow(observed),
                        y=NULL     # overwrite the default y=..count...
                    ),
                    width=0.2,
                    position=position_nudge(x=-0.2)
                )
            )
        )
    )
)

image

It might be also interesting to apply this to intersection ratios.

Can you share how you calculate the confidence intervals? I would be curious to learn!

I just saw that you mention the expected bars (which might make more sense) - but the relevant code is easy to adjust I guess.

krassowski avatar May 06 '20 03:05 krassowski

Thanks so much! And oh I'm not computing confidence intervals for the observed data, although I maybe I could using a bootstrapping procedure, maybe I should try that... I'm merely trying to add standard deviation bars to the expected bars which are generated from a null model.

outlace avatar May 06 '20 04:05 outlace

Hmm would there be a way to directly enter the intersect size counts for my expected bar? When trying to add the SEMs as a geom_errorbar the ymin/max arrays need to be as long as the input observation data which doesn't make sense to me. I have 62 bars for 62 different set intersections but it wont let me put in an array of length 62.

outlace avatar May 06 '20 05:05 outlace

Sorry, I cannot see where lies the problem. I would be happy to help if you can provide a reproducible example.

On Wed, 6 May 2020, 06:38 Brandon B, [email protected] wrote:

Hmm would there be a way to directly enter the intersect size counts for my expected bar? When trying to add the SEMs as a geom_errorbar the ymin/max arrays need to be as long as the input observation data which doesn't make sense to me. I have 62 bars for 62 different set intersections but it wont let me put in an array of length 62.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/krassowski/complex-upset/issues/31#issuecomment-624451766, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMQBRRGBOITLAAIG523EU3RQDZW5ANCNFSM4MZVXMMQ .

krassowski avatar May 06 '20 05:05 krassowski

Hi,

I tried to reproduce the exact plot according to Mike's instructions and code, but I couldn't. I'm getting the output in below.

Screen Shot 2021-08-27 at 15 07 00

As this issue is from last year, maybe some internals have changed and this way doesn't work anymore. As I'm a newcomer in R programming language, I would like to ask: which modifications should I do to reproduce the double bar plot?

Thanks.

VirtualSpaceman avatar Aug 27 '21 18:08 VirtualSpaceman

@VirtualSpaceman you did all well, it is just that there were many changes since version 0.4.0 and that code does not work any more. I will look into that.

krassowski avatar Aug 27 '21 18:08 krassowski