sesame icon indicating copy to clipboard operation
sesame copied to clipboard

visualizeSegments plots chromosome label at chromosome midpoint rather than centromere

Open j-andrews7 opened this issue 5 months ago • 4 comments

Currently, visualizeSegments adds the chromosome label at the midpoint of the chromosome based on length: https://github.com/zwdzwd/sesame/blob/13b0feed4dbcbf5bd32e9b41db11828767e8f0a2/R/cnv.R#L318

In conumee, it adds it to the middle of the centromere, which makes it easier to see whole arm changes. The genomeInfo seems to have general gap regions of the genome, but doesn't have them explicitly split or labeled in a way that makes the centromere info extractable from there.

This would be a nice change. As would be the ability to add gene labels to bins on the plot.

j-andrews7 avatar Jan 10 '24 15:01 j-andrews7

Thanks for the suggestion. Some chromosomes don't have centromere or acrocentric. I added the centromere location to the plot. Hope that helps

sdfs <- sesameDataGet('EPICv2.8.SigDF')
sdf <- sdfs[["K562_206909630040_R01C01"]]
seg <- cnSegmentation(sdf)
visualizeSegments(seg, genes.to.label=c("ABL1","BCR"))
visualizeSegments(seg, to.plot=c("chr9","chr22"))
image image

zwdzwd avatar Jan 11 '24 20:01 zwdzwd

Oh, nice. Gotta say I didn't expect a response/update so quick, so I did roll my own (quite messy) version as well if it's of any interest:

#' Visualize segments
#'
#' The function takes a \code{CNSegment} object obtained from cnSegmentation
#' and plot the bin signals and segments (as horizontal lines). 
#' 
#' By default, a dashed line will be drawn at the midpoint of each chromosome, but
#' centromere locations can be provided via the \code{centromere} argument for
#' more accurate chromosome arm delimination.
#' 
#' Gene labels can be added by providing a \code{GRanges} object with gene locations.
#' The \code{id.col} argument specifies the metadata column name in \code{genes} to 
#' use for labeling. If a gene overlaps multiple bins, the one with the highest 
#' overlap will be used for labeling.
#'
#' @param seg A \code{CNSegment} object.
#' @param genes A \code{GRanges} object with gene locations to 
#'   label on bins.
#' @param id.col Column name for gene ID in \code{genes}.
#' @param centromere A \code{GRanges} object with centromere locations 
#'   use for chromosome label.
#' @param main Title for the plot.
#' @param to.plot Chromosome to plot (by default plot all chromosomes).
#' @param point.size Size of the points.
#' @param color.low Color for low ratio.
#' @param color.mid Color for baseline ratio.
#' @param color.high Color for high ratio.
#' @param color.seg Color for segment lines.
#' @param hover.text.cols Columns to show in hover text, should be present in
#'  \code{mcols(seg$bin.coords)}.
#' @importFrom GenomicRanges start end seqnames seqinfo 
#'   subjectHits queryHits findOverlaps mcols
#' @importFrom ggplot2 ggplot geom_segment geom_point geom_text geom_vline
#'   element_text theme aes ggtitle scale_x_continuous 
#' @importFrom plotly ggplotly layout config add_annotations toWebGL
#'   scale_colour_gradient2
#' @return plotly object
#' 
#' @examples
#' sesameDataCache()
#' ## sdf <- sesameDataGet('EPIC.1.SigDF')
#' ## sdfs.normal <- sesameDataGet('EPIC.5.SigDF.normal')
#' ## seg <- cnSegmentation(sdf, sdfs.normal)
#' ## visualizeSegments(seg)
#'
#' sesameDataGet_resetEnv()
#' 
#' @export
visualizeSegments <- function(seg, 
                              genes = NULL, 
                              id.col = "hgnc_symbol",
                              centromere = NULL,
                              main = NULL,
                              to.plot = NULL,
                              point.size = 1.5,       
                              color.low = "blue", 
                              color.mid = "grey80",
                              color.high = "red",
                              color.seg = "yellow",
                              hover.text.cols = "signal") {

    stopifnot(is(seg, "CNSegment"))
    bin.coords <- seg$bin.coords
    bin.seqinfo <- seqinfo(bin.coords)
    bin.signals <- seg$bin.signals
    bin.coords$signal[names(bin.coords) %in% names(bin.signals)] <- bin.signals[match(names(bin.coords), names(bin.signals))]
    bin.coords$label <- NA
    
    sigs <- seg$seg.signals
    total.length <- sum(as.numeric(bin.seqinfo@seqlengths), na.rm=TRUE)
    
    ## skip chromosome too small (e.g, chrM)
    if (is.null(to.plot)) {
        to.plot <- (bin.seqinfo@seqlengths > total.length*0.01)
    }
    

    seq.names <- bin.seqinfo@seqnames[to.plot]
    seqlen <- as.numeric(bin.seqinfo@seqlengths[to.plot])
    totlen <- sum(seqlen, na.rm=TRUE)
    seqcumlen <- cumsum(seqlen)
    seqstart <- setNames(c(0,seqcumlen[-length(seqcumlen)]), seq.names)
    seqmids <- seqstart+seqlen/2
    
    if (!is.null(centromere)) {
        stopifnot(is(centromere, "GRanges"))
        centromere <- centromere[as.vector(seqnames(centromere)) %in% seq.names]
        centromere$centromere.mids <- (start(centromere) + end(centromere)) / 2
        seqmids <- seqstart + centromere$centromere.mids
    }
    
    bin.coords <- bin.coords[as.vector(seqnames(bin.coords)) %in% seq.names]

    GenomicRanges::values(bin.coords)$bin.mids <-
        (start(bin.coords) + end(bin.coords)) / 2
    GenomicRanges::values(bin.coords)$bin.x <-
        seqstart[as.character(seqnames(bin.coords))] + bin.coords$bin.mids
    
    # Add genes to bin.coords GRanges object.
    if (!is.null(genes)) {
        stopifnot(is(genes, "GRanges"))
        for (i in seq_along(genes)) {
            # Find overlapping bins
            overlapping_bins <- findOverlaps(genes[i], bin.coords)
            
            # Calculate the length of each overlap
            lengths <- width(pintersect(genes[i], bin.coords[subjectHits(overlapping_bins)]))
            
            # Identify the bin with the majority overlap
            if (length(lengths) > 0) {
              max_overlap_index <- which.max(lengths)
              bin_index <- subjectHits(overlapping_bins)[max_overlap_index]
              
              # Assign the gene symbol to the bin
              bin.coords$label[bin_index] <- mcols(genes)[[id.col]][i]
            }
        }
    }
    
    # Make hover info
    if (!is.null(hover.text.cols)) {
        hover.strings <- lapply(hover.text.cols, function(n) paste0("</br><b>", n, ":</b> ", mcols(bin.coords)[[n]]))
        bin.coords$text <- do.call("paste0", hover.strings)
    }
    
    ## plot bin
    p <- ggplot() + ggtitle(main) 
    p <- p + geom_point(aes(x = bin.coords$bin.x / totlen,
                            y = bin.coords$signal, 
                            color = bin.coords$signal,
                            text = bin.coords$text), 
                        size = point.size, alpha = I(0.8))

    ## plot segment
    seg.beg <- (seqstart[sigs$chrom] + sigs$loc.start) / totlen
    seg.end <- (seqstart[sigs$chrom] + sigs$loc.end) / totlen
    p <- p + geom_segment(aes(x = seg.beg, xend = seg.end,
        y = sigs$seg.mean, yend = sigs$seg.mean), size = 1, color = color.seg)

    ## chromosome boundary
    p <- p + geom_vline(xintercept = seqstart[-1]/totlen,
        linetype = "solid", size = 0.25)
    
    ## chromosome midpoint line
    p <- p + geom_vline(xintercept = seqmids/totlen,
        linetype = "dashed", alpha = 0.5, color = "grey80")

    ## chromosome label
    p <- p + scale_x_continuous(
        labels=seq.names, breaks = (seqmids)/totlen, 
        expand = c(0,0)) +
        theme(axis.text.x = element_text(angle=90, hjust = 0.5, vjust = 0.5))

    p <- p + scale_colour_gradient2(
        limits = c(-0.3,0.3), 
        low = color.low, 
        mid = color.mid, 
        high = color.high,
        oob = squish) + xlab("") + ylab("log2(ratio)") +
        theme(legend.position = "none") 
    
    p <- ggplotly(p, tooltip = "text") %>% 
      config(editable = list(annotationTail = TRUE,
                             axisTitleText = TRUE,
                             titleText = TRUE), 
             displaylogo = FALSE,
             toImageButtonOptions = list(format = "svg")) %>%
      layout(yaxis = list(
              zerolinecolor = toRGB("black"),
              zerolinewidth = 1,
              zeroline = TRUE,
              showgrid = FALSE,
              showline = TRUE,
              mirror = TRUE),
            xaxis = list(
              zeroline = FALSE,
              showgrid = FALSE,
              showline = TRUE,
              mirror = TRUE),
            plot_bgcolor = "white"
          ) %>% toWebGL()
    
    if (!is.null(genes)) {
      binnies <- bin.coords[!is.na(bin.coords$label)]
      uppy <- binnies[which(binnies$signal >= 0),]
      downer <- binnies[which(binnies$signal < 0),]
      
      if (length(uppy$signal) > 0) {
        p <- p %>% add_annotations(
          x = uppy$bin.x / totlen,
          y = uppy$signal, 
          text = uppy$label,
          font = list(size = 10),
          showarrow = TRUE,
          xref = "x",
          yref = "y",
          showarrow = TRUE,
          arrowhead = 4,
          arrowsize = .5,
          ax = 20,
          ay = -40)
      }
      
      if (length(downer$signal) > 0) {
        p <- p %>% add_annotations(
          x = downer$bin.x / totlen,
          y = downer$signal, 
          text = downer$label,
          font = list(size = 10),
          showarrow = TRUE,
          xref = "x",
          yref = "y",
          showarrow = TRUE,
          arrowhead = 4,
          arrowsize = .5,
          ax = 20,
          ay = 40)
      }
    }
    
    p
}

I like your ideograms a lot, a clever idea. I changed the tick locations to the middle of the centromere/acrocentric region and added dash lines there as well, which can be a useful visual guide with looking higher up on the plot. Your gene labels are also more clear, but they will probably get muddied if more than one gene in close bins are labeled.

Regardless, thanks for such a quick response/solution.

I swapped to plotly output as I have Shiny aspirations for these plots. cn

j-andrews7 avatar Jan 11 '24 22:01 j-andrews7

Wow, thank you @j-andrews7 ! I will learn from your code and incorporate it when I have time (if you don't mind). I like ggplotly a lot, but BioC doesn't like too much dependency. I also updated sesameData to have genomeInfo carry centromere in the future (just having been updating the mm39 this week, might as well add it). Thanks again for the suggestions!

zwdzwd avatar Jan 12 '24 01:01 zwdzwd

I don't mind at all if any of the elements are incorporated, feel free to rip whatever's useful.

plotly could be an optional parameter, have to set interactive = TRUE or something. That way it could just go in suggests and you can build in a check to throw an error if not installed. That's what I've done in the past for things like this to avoid the dependency count creeping up. Exposing some of the plot aesthetics parameters is always welcome - we have a color blind person in our department that can't differentiate red/green very well so being able to adjust those is nice.

Thanks for keeping this package up to date to work with the V2 arrays and taking the time to build good documentation! Everything else is lagging behind.

j-andrews7 avatar Jan 12 '24 16:01 j-andrews7