FFTrees icon indicating copy to clipboard operation
FFTrees copied to clipboard

plot.FFTrees() is not written in a modular way and should be refactored

Open ndphillips opened this issue 1 year ago • 8 comments

Currently plot.FFTrees(), creates a dashboard of multiple plotting elements (i.e.; Bars, tree, icons, confusion matrix, ROC).

image

The script creating this dashboard https://github.com/ndphillips/FFTrees/blob/master/R/plotFFTrees_function.R is over 2,000 lines long in one massive function.

Having so much code in one script makes it difficult to debug and doesn't lend itself well to creating user functions that allow them to selectively plot some of the plotting elements, some of which may not even technically need data to create.

  • For example, I have often wanted to quickly plot an example FFT, without any icons, for demonstration purposes and without the need to actually train on data - just to create a diagram representing a tree of interest. Currently the code to do this is in the function but can't easily be separated from training a tree.

Goals

  • Re-factor plot.FFTrees() to be a wrapper around sub-functions that create each of the elements of the dashboard.

ndphillips avatar May 22 '24 20:05 ndphillips

Yes, that's a great idea. And it would be great if each component of the complex panel (e.g., the baseline data info, icon arrays, or 2x2 matrix) could be plotted separately as well (as it currently works for the ROC curve). But given the length of the current function and its complex dependencies (for the x-y-coordinates), I've so far shied away from this challenge — so I'd be delighted if you could take it on!

hneth avatar May 23 '24 08:05 hneth

Thanks glad to see you agree.

Fixing this is indeed quite a challenge. The code underlying plot.FFTrees() is as poorly designed as the output is beautiful :) -- I'm ok saying that out loud since it's all my fault!

ndphillips avatar May 23 '24 13:05 ndphillips

Let me try breaking down the plot into its elements:

image

Elements

Number Title Type Example
G1 Level Bar Graphic image
G2 Icon Array Graphic image
G3 Tree Graphic image
G4 Confusion Matrix Graphic image
G5 ROC Graphic image
L1 Section Title Label image
L2 Icon Label Label image

Now, we can represent the sections using these elements:

Header

image
Element Number
L1. Section Title 1
G1. Level Bar 2
G2. Icon Array 1

Main Plot

image
Element Number
L1. Section Title 1
L2. Icon Label 2
G2. Icon Array # Nodes + 1
G3. Tree 1

Footer

image
Element Number
L1. Section Title 1
G1. Level Bar 6
G4. Confusion Matrix 1
G5. ROC 1

ndphillips avatar May 23 '24 13:05 ndphillips

From this, I would propose creating the following functions:

Elements

The functions below plot individual plotting elements:

Function Element
plot_icon_label() L2
plot_level_bar() G1
plot_icon_array() G2
plot_tree() G3
plot_confusion_matrix() G4
plot_roc() G5

Sections

The functions below plot sections, Where sections are a combination of individual elements plus some labels and arranging:

Function Section
plot_section_header() S1
plot_section_main() S2
plot_section_footer() S3

For example, plot_section_header() would look something like this:

plot_section_header <- function(...) {

# Create individual elements
p1 <- plot_level_bar(...)
p2 <- plot_icon_array(...)
p3 <- plot_level_bar(...)

# Some code to arrange the elements
p <- arrange_fun(p1, p2, p3)

return(p)

}

ndphillips avatar May 23 '24 14:05 ndphillips

One big question I do have is whether to transition to ggplot2 or stay in base-R.

I love ggplot2. In my personal and professional work, I have completely transitioned to it.

Let's think through the pros and cons of transitioning to ggplot2:

Pros

  • Flexible - ggplot2 is super flexible and extendable.
    • Opens up the possibilities of introducing elements such as animations (https://gganimate.com/), themes (https://ggplot2.tidyverse.org/reference/ggtheme.html), and interactivity (https://plotly.com/ggplot2/)
    • Allows users the option to take the output ggplot2 object and do something else with it

Cons

  • Time - Will basically require re-writing all plotting functionality from scratch
  • Dependencies - Will increase number of dependencies in package

ndphillips avatar May 23 '24 15:05 ndphillips

Thanks for the breakdown in modular components, which is very helpful. When re-creating the summary plot from these components, it would be much easier to create alternative versions or swap individual components.

For instance, I've thinking about replacing the icon arrays at the exits with 2x2 matrices that sum up either the classified cases at the current node (i.e., either the top or bottom row of a 2x2 matrix) OR all classified cases so far. Additionally, it would be helpful to visualize how NA values in cues are being handled (e.g., show their number at nodes and how they are classified eventually). Adding such functionality in a more modular framework may involve editing several functions, but seems far easier than fiddling with the complex monolithic function we had before.

With regards to ggplot2: I love and use this package as well, but see more immediate costs than benefits by transitioning to it. I we were to develop a general infrastructure for plotting trees and aspects of their elements, ggplot2 may be an option (especially if we wanted a wider community to contribute). But as FFTs have some special constraints and our present plotting options are still quite sophisticated I'd opt for a base R solution within FFTrees.

hneth avatar May 24 '24 08:05 hneth

Thanks Hans. I agree that, at least to start, sticking with base-R is a reasonable option. I'm working on this in #224

ndphillips avatar May 24 '24 13:05 ndphillips

Excellent! btw: I previously wrote a bunch of plotting macros (e.g., functions for drawing boxes and links between them) for the diagrams in the riskyr package (see plot_util.R). Some of these functions are too specific for FFTrees, but the more basic ones could be helpful when drawing trees.

hneth avatar May 24 '24 14:05 hneth