ggeasy
ggeasy copied to clipboard
A grammar of graphics details?
I was just wondering if it is worth designing and defining an formal or semi-formal grammar for these function names, so that they are readily guessable without having to look them up? In other words, a naming convention, such as, based on what has been done so far, [prefix][axis-qualifier][attribute|verb].
Or, rather than having to define a zillion easy_ functions, what about one function that uses a little DSL to set the theme attributes? No need to mess with lex and yacc (which are available in R via the rly package btw), it might be enough just to pass commands and scalars as ellipsis arguments eg
ggdetails("x", "axis", "blue")
or, equivalently,
ggdetails("blue", "x", "axis")
or, using a lexer (eg rly) to tokenise a single string argument:
ggdetails("blue x axis")
or
ggdetails("x axis blue")
This also makes it easier for users who are non-native English speakers, whose natural word-ordering assumptions may be different - order would not matter.
The order of the ellipsis arguments or tokens wouldn't matter because the class of the argument or token can be inferred from its value in the very constrained context of the ggdetails() function. "axis", "legend", "text" can only refer to plot elements, "blue", "orange" or a hex RGB value can only refer to colours, and "2" or "5" are scalars (for font size or rotation), and "+25%" or "-33%" means increase or decrease current size (or whatever is specified by 25% or 33% respectively. That way argument order doesn't need to be remembered.
Actually, using yacc and lex via rly to build a simple DSL might be the best option, but the utility of the concept could be tested using individual ellipsis arguments to start with.
This is a seriously cool idea. I'd lean towards having the zillion helper functions and the 'ggbot' chat lexer. Looking at https://github.com/systemincloud/rly I don't think I could immediately build that, so if you're more familiar with the idea then by all means please have a crack at it. I think it would be of great benefit!
We could make a fairly simple approximation to this with a heap of if()
statements... I started writing this then got carried away trying to see how it might work... now it's here: https://github.com/jonocarroll/ggeasy/pull/8
The text expression is a command, in which the subject is implicit (the subject is "ggbot"), the verb is implicit ("make"), the object of the command is one of the entities listed under Arguments here, the attribute of the named entity to be modified is inferred from the attribute value ("blue" must be a colour, "2" must be an absolute size/thickness scalar, "2cm" is scale with units, "+25%" is a relative scalar, "-45deg" is an angular quantity etc). In some cases the attribute to be modified might need to be named explicitly. One command per string, but a vector of strings or multiple ellipsis string arguments could be passed to one call of ggbot(). Probably want to allow modification of multiple attributes per command string, but strictly only one object entity per command. Thus, "text blue" would make all text blue. "text blue 15" would make all text blue and size 15. But "text line blue" would be illegal because there are two object entities (targets): "text" and "line". (Aside: this restriction of one target entity per command is just to keep it simple to start with).
Now, a minor complication is how the object entities should be specified. "text" or "line" are easy, and, as per the ggplot2 theme model, these apply to all text or to all lines. But what about, say, the x-axis text? Well, adding "x" or "y" (or "z"?) implies that modification of some axis attributes are being requested, in other words, that "axis" is implied. If modification of both axes is desired, then any or all of the following could be supported: "axis blue", "axes blue" "x y blue". Except we haven't specified which aspect of the axis or axes we want to modify, so we need a qualifier: for axes, the valid ones are "title", "text", "ticks" and "line" (and pluralised fires of those etc).
What about suppressing elements? I think the solution is recognise some special-case attributes, such as "invisible", or gerund forms such as "disappear", "begone" or just "no" or "none" or "zap" or "ditch" etc.
OK, is this language model adequate? The way to find out is to build a table of all the entity target types which theme() supports, and a separate table for each entity type, enumerating all the attributes that entity type can have set, and specify an example command string and check that it can be unambiguously parsed. Then check that there is no overlap between any of the words used as specifications in both tables. If there is any overlap (i.e. the sets of words are not disjoint) then there will be ambiguity which can only be resolved by word order, which means using a more complex language model.
Creating the tables is a slightly tedious task, but if split up shouldn't take too long. Once the adequacy of the language model is confirmed, or it is tweaked until adequate, then coding should commence. Such a table can also provide the basis for the ggbot() tests, of course.
Obviously, lots of synonyms can be included in the language model. The question is that whether an unambiguous model can be constructed with just single-word tokens, or not? On a quick scan, I think it can, but that needs to be thoroughly checked. If not, then a smarter tokeniser may be needed. A lemmatiser could also handle synonyms and alternative spellings etc. But I agree, the aim should be to keep it as lightweight as possible. The aim of designing the language model first, before coding it up, is to check whether a bunch of if/else statements is enough, or whether a proper lever and lemmatiser is needed or worthwhile. Or whether is is better to build a formal domain-specific language, in which case yacc (via rly) needs to be used to build a parse tree. However, I don't think we want a formal DSL.
entity | element type | specifier(s) | synonyms |
---|---|---|---|
line | element_line | line | lines |
rect | element_rect | rect | rectangle, rectangles |
text | element_text | text | |
title | element_text | title | titles, headings |
aspect.ratio | ? | aspect, ratio | |
axis.title | element_text | axis, title | axes, titles |
axis.title.x | element_text | x, title, axis (implied by x) |
...and the rest of these, and then need to consider the settable attributes for element type, in a separate table. The main thing is to ensure that there is no overlap (i.e. ambiguity) between the way entities are specified and the way attributes and quantities/values are specified.
axis.title.x.top
x axis label on top axis (element_text; inherits from axis.title.x)
axis.title.y
y axis label (element_text; inherits from axis.title)
axis.title.y.right
y axis label on right axis (element_text; inherits from axis.title.y)
axis.text
tick labels along axes (element_text; inherits from text)
axis.text.x
x axis tick labels (element_text; inherits from axis.text)
axis.text.x.top
x axis tick labels on top axis (element_text; inherits from axis.text.x)
axis.text.y
y axis tick labels (element_text; inherits from axis.text)
axis.text.y.right
y axis tick labels on right axis (element_text; inherits from axis.text.y)
axis.ticks
tick marks along axes (element_line; inherits from line)
axis.ticks.x
x axis tick marks (element_line; inherits from axis.ticks)
axis.ticks.y
y axis tick marks (element_line; inherits from axis.ticks)
axis.ticks.length
length of tick marks (unit)
axis.line
lines along axes (element_line; inherits from line)
axis.line.x
line along x axis (element_line; inherits from axis.line)
axis.line.y
line along y axis (element_line; inherits from axis.line)
legend.background
background of legend (element_rect; inherits from rect)
legend.margin
the margin around each legend (margin)
legend.spacing
the spacing between legends (unit)
legend.spacing.x
the horizontal spacing between legends (unit); inherits from legend.spacing
legend.spacing.y
the horizontal spacing between legends (unit); inherits from legend.spacing
legend.key
background underneath legend keys (element_rect; inherits from rect)
legend.key.size
size of legend keys (unit)
legend.key.height
key background height (unit; inherits from legend.key.size)
legend.key.width
key background width (unit; inherits from legend.key.size)
legend.text
legend item labels (element_text; inherits from text)
legend.text.align
alignment of legend labels (number from 0 (left) to 1 (right))
legend.title
title of legend (element_text; inherits from title)
legend.title.align
alignment of legend title (number from 0 (left) to 1 (right))
legend.position
the position of legends ("none", "left", "right", "bottom", "top", or two-element numeric vector)
legend.direction
layout of items in legends ("horizontal" or "vertical")
legend.justification
anchor point for positioning legend inside plot ("center" or two-element numeric vector) or the justification according to the plot area when positioned outside the plot
legend.box
arrangement of multiple legends ("horizontal" or "vertical")
legend.box.just
justification of each legend within the overall bounding box, when there are multiple legends ("top", "bottom", "left", or "right")
legend.box.margin
margins around the full legend area, as specified using margin
legend.box.background
background of legend area (element_rect; inherits from rect)
legend.box.spacing
The spacing between the plotting area and the legend box (unit)
panel.background
background of plotting area, drawn underneath plot (element_rect; inherits from rect)
panel.border
border around plotting area, drawn on top of plot so that it covers tick marks and grid lines. This should be used with fill=NA (element_rect; inherits from rect)
panel.spacing
spacing between facet panels (unit)
panel.spacing.x
horizontal spacing between facet panels (unit; inherits from panel.spacing)
panel.spacing.y
vertical spacing between facet panels (unit; inherits from panel.spacing)
panel.grid
grid lines (element_line; inherits from line)
panel.grid.major
major grid lines (element_line; inherits from panel.grid)
panel.grid.minor
minor grid lines (element_line; inherits from panel.grid)
panel.grid.major.x
vertical major grid lines (element_line; inherits from panel.grid.major)
panel.grid.major.y
horizontal major grid lines (element_line; inherits from panel.grid.major)
panel.grid.minor.x
vertical minor grid lines (element_line; inherits from panel.grid.minor)
panel.grid.minor.y
horizontal minor grid lines (element_line; inherits from panel.grid.minor)
panel.ontop
option to place the panel (background, gridlines) over the data layers. Usually used with a transparent or blank panel.background. (logical)
plot.background
background of the entire plot (element_rect; inherits from rect)
plot.title
plot title (text appearance) (element_text; inherits from title) left-aligned by default
plot.subtitle
plot subtitle (text appearance) (element_text; inherits from title) left-aligned by default
plot.caption
caption below the plot (text appearance) (element_text; inherits from title) right-aligned by default
plot.margin
margin around entire plot (unit with the sizes of the top, right, bottom, and left margins)
strip.background
background of facet labels (element_rect; inherits from rect)
strip.placement
placement of strip with respect to axes, either "inside" or "outside". Only important when axes and strips are on the same side of the plot.
strip.text
facet labels (element_text; inherits from text)
strip.text.x
facet labels along horizontal direction (element_text; inherits from strip.text)
strip.text.y
facet labels along vertical direction (element_text; inherits from strip.text)
strip.switch.pad.grid
space between strips and axes when strips are switched (unit)
strip.switch.pad.wrap
space between strips and axes when strips are switched (unit)