FFTrees
FFTrees copied to clipboard
plot.FFTrees() is not written in a modular way and should be refactored
Currently plot.FFTrees(), creates a dashboard of multiple plotting elements (i.e.; Bars, tree, icons, confusion matrix, ROC).
The script creating this dashboard https://github.com/ndphillips/FFTrees/blob/master/R/plotFFTrees_function.R is over 2,000 lines long in one massive function.
Having so much code in one script makes it difficult to debug and doesn't lend itself well to creating user functions that allow them to selectively plot some of the plotting elements, some of which may not even technically need data to create.
- For example, I have often wanted to quickly plot an example FFT, without any icons, for demonstration purposes and without the need to actually train on data - just to create a diagram representing a tree of interest. Currently the code to do this is in the function but can't easily be separated from training a tree.
Goals
- Re-factor
plot.FFTrees()to be a wrapper around sub-functions that create each of the elements of the dashboard.
Yes, that's a great idea. And it would be great if each component of the complex panel (e.g., the baseline data info, icon arrays, or 2x2 matrix) could be plotted separately as well (as it currently works for the ROC curve). But given the length of the current function and its complex dependencies (for the x-y-coordinates), I've so far shied away from this challenge — so I'd be delighted if you could take it on!
Thanks glad to see you agree.
Fixing this is indeed quite a challenge. The code underlying plot.FFTrees() is as poorly designed as the output is beautiful :) -- I'm ok saying that out loud since it's all my fault!
Let me try breaking down the plot into its elements:
Elements
| Number | Title | Type | Example |
|---|---|---|---|
| G1 | Level Bar | Graphic | |
| G2 | Icon Array | Graphic | |
| G3 | Tree | Graphic | |
| G4 | Confusion Matrix | Graphic | |
| G5 | ROC | Graphic | |
| L1 | Section Title | Label | |
| L2 | Icon Label | Label |
Now, we can represent the sections using these elements:
Header
| Element | Number |
|---|---|
| L1. Section Title | 1 |
| G1. Level Bar | 2 |
| G2. Icon Array | 1 |
Main Plot
| Element | Number |
|---|---|
| L1. Section Title | 1 |
| L2. Icon Label | 2 |
| G2. Icon Array | # Nodes + 1 |
| G3. Tree | 1 |
Footer
| Element | Number |
|---|---|
| L1. Section Title | 1 |
| G1. Level Bar | 6 |
| G4. Confusion Matrix | 1 |
| G5. ROC | 1 |
From this, I would propose creating the following functions:
Elements
The functions below plot individual plotting elements:
| Function | Element |
|---|---|
plot_icon_label() |
L2 |
plot_level_bar() |
G1 |
plot_icon_array() |
G2 |
plot_tree() |
G3 |
plot_confusion_matrix() |
G4 |
plot_roc() |
G5 |
Sections
The functions below plot sections, Where sections are a combination of individual elements plus some labels and arranging:
| Function | Section |
|---|---|
plot_section_header() |
S1 |
plot_section_main() |
S2 |
plot_section_footer() |
S3 |
For example, plot_section_header() would look something like this:
plot_section_header <- function(...) {
# Create individual elements
p1 <- plot_level_bar(...)
p2 <- plot_icon_array(...)
p3 <- plot_level_bar(...)
# Some code to arrange the elements
p <- arrange_fun(p1, p2, p3)
return(p)
}
One big question I do have is whether to transition to ggplot2 or stay in base-R.
I love ggplot2. In my personal and professional work, I have completely transitioned to it.
Let's think through the pros and cons of transitioning to ggplot2:
Pros
- Flexible - ggplot2 is super flexible and extendable.
- Opens up the possibilities of introducing elements such as animations (https://gganimate.com/), themes (https://ggplot2.tidyverse.org/reference/ggtheme.html), and interactivity (https://plotly.com/ggplot2/)
- Allows users the option to take the output ggplot2 object and do something else with it
Cons
- Time - Will basically require re-writing all plotting functionality from scratch
- Dependencies - Will increase number of dependencies in package
Thanks for the breakdown in modular components, which is very helpful. When re-creating the summary plot from these components, it would be much easier to create alternative versions or swap individual components.
For instance, I've thinking about replacing the icon arrays at the exits with 2x2 matrices that sum up either the classified cases at the current node (i.e., either the top or bottom row of a 2x2 matrix) OR all classified cases so far. Additionally, it would be helpful to visualize how NA values in cues are being handled (e.g., show their number at nodes and how they are classified eventually). Adding such functionality in a more modular framework may involve editing several functions, but seems far easier than fiddling with the complex monolithic function we had before.
With regards to ggplot2: I love and use this package as well, but see more immediate costs than benefits by transitioning to it. I we were to develop a general infrastructure for plotting trees and aspects of their elements, ggplot2 may be an option (especially if we wanted a wider community to contribute). But as FFTs have some special constraints and our present plotting options are still quite sophisticated I'd opt for a base R solution within FFTrees.
Thanks Hans. I agree that, at least to start, sticking with base-R is a reasonable option. I'm working on this in #224
Excellent! btw: I previously wrote a bunch of plotting macros (e.g., functions for drawing boxes and links between them) for the diagrams in the riskyr package (see plot_util.R). Some of these functions are too specific for FFTrees, but the more basic ones could be helpful when drawing trees.