gggenes icon indicating copy to clipboard operation
gggenes copied to clipboard

Implement SBOL sequence feature glyphs

Open wilkox opened this issue 11 months ago • 3 comments

Implement the Synthetic Biology Open Standard (SBOL) sequence feature glyphs. The current rough plan for this is:

  • Add a geom_<glyph>() function for each sequence feature glyph
  • geom_gene_arrow() becomes a wrapper for geom_CDS() and soft deprecated
  • geom_subgene_arrow() becomes a wrapper for geom_polypeptide_region() and soft deprecated
  • geom_feature() becomes a convenience geom that wraps all the glyph geoms and accepts a type aesthetic. This allows a user to draw different types of sequence feature with a single layer, rather than having to add a new geom layer for each type, which could get tedious. This function will lack the flexibility of having different aesthetic mappings for different glyphs, as well as fine-tune control of glyph geometry, but it will still probably cover a large proportion of use cases. To maintain backward compatibility, if the type aesthetic is not mapped, it should draw promoters or locations with the current geom_feature() interface, but this functionality will be soft-deprecated
  • Continue the pattern of using the xmin, xmax and forward aesthetics to control the direction of directional glyphs
  • Continue the pattern of geom_gene_arrow()/geom_gene_label() by having a separate geom_<glyph>_label() for each glyph (geom_CDS_label(), geom_intron_label() etc.). This is necessary to preserve the ggplot2 grammar, as a user might want to use different aesthetic mappings for glyphs and labels. geom_gene_label(), geom_feature_label(), and geom_subgene_label() would become wrappers and soft deprecated
  • Each of the geom_<glyph>() layers will also accept a label aesthetic which draws a text label for the feature with sensible defaults

I've opened a SBOL_glyphs branch to work on this, and added geom_aptamer() and geom_aptamer_label() as a starting point:

library(tidyverse)
library(gggenes)
aptamers <- data.frame(molecule = c("Genome1", "Genome1", "Genome1", "Genome2", "Genome2", "Genome2"), location = c(50, 71, 13, 8, 12, 91), name = paste0("Apt", 1:6))
ggplot(aptamers, aes(x = location, y = molecule, label = name)) +
  geom_aptamer(inherit.aes = TRUE, height = grid::unit(10, "mm")) +
  geom_aptamer_label(inherit.aes = TRUE, height = grid::unit(10, "mm"))

Created on 2023-07-11 with reprex v2.0.2

To get the coordinates for the aptamer glyph, I downloaded the SVG files for the glyphs from the latest SBOL release then converted them to grid-compatible coordinates with the svgparser package:

aptamer <- read_svg("~/Downloads/glyphs-svg/aptamer.svg", obj_type = "data.frame")

This has the pleasing benefit that paths expressed as Bézier curves in the SVG are automatically converted into a series of short line segments, which sidesteps the trouble of transforming Béziers into polar coordinates. I think this method of extracting the glyph coordinates from the SVG assets should be the rule, though no doubt there will be some exceptions where this is not be best choice.

wilkox avatar Jul 11 '23 07:07 wilkox