Currently plot.FFTrees(), creates a dashboard of multiple plotting elements (i.e.; Bars, tree, icons, confusion matrix, ROC).

The script creating this dashboard https://github.com/ndphillips/FFTrees/blob/master/R/plotFFTrees_function.R is over 2,000 lines long in one massive function.

Having so much code in one script makes it difficult to debug and doesn't lend itself well to creating user functions that allow them to selectively plot some of the plotting elements, some of which may not even technically need data to create.

For example, I have often wanted to quickly plot an example FFT, without any icons, for demonstration purposes and without the need to actually train on data - just to create a diagram representing a tree of interest. Currently the code to do this is in the function but can't easily be separated from training a tree.

Goals

Re-factor plot.FFTrees() to be a wrapper around sub-functions that create each of the elements of the dashboard.

May 22 '24 20:05 ndphillips

Yes, that's a great idea. And it would be great if each component of the complex panel (e.g., the baseline data info, icon arrays, or 2x2 matrix) could be plotted separately as well (as it currently works for the ROC curve). But given the length of the current function and its complex dependencies (for the x-y-coordinates), I've so far shied away from this challenge — so I'd be delighted if you could take it on!

May 23 '24 08:05 hneth

Thanks glad to see you agree.

Fixing this is indeed quite a challenge. The code underlying plot.FFTrees() is as poorly designed as the output is beautiful :) -- I'm ok saying that out loud since it's all my fault!

May 23 '24 13:05 ndphillips

Let me try breaking down the plot into its elements:

Elements

Number	Title	Type
G1	Level Bar	Graphic
G2	Icon Array	Graphic
G3	Tree	Graphic
G4	Confusion Matrix	Graphic
G5	ROC	Graphic
L1	Section Title	Label
L2	Icon Label	Label

Now, we can represent the sections using these elements:

Header

Element	Number
L1. Section Title	1
G1. Level Bar	2
G2. Icon Array	1

Main Plot

Element	Number
L1. Section Title	1
L2. Icon Label	2
G2. Icon Array	# Nodes + 1
G3. Tree	1

Footer

Element	Number
L1. Section Title	1
G1. Level Bar	6
G4. Confusion Matrix	1
G5. ROC	1

May 23 '24 13:05 ndphillips

From this, I would propose creating the following functions:

Elements

The functions below plot individual plotting elements:

Function	Element
`plot_icon_label()`	L2
`plot_level_bar()`	G1
`plot_icon_array()`	G2
`plot_tree()`	G3
`plot_confusion_matrix()`	G4
`plot_roc()`	G5

Sections

The functions below plot sections, Where sections are a combination of individual elements plus some labels and arranging:

Function	Section
`plot_section_header()`	S1
`plot_section_main()`	S2
`plot_section_footer()`	S3

For example, plot_section_header() would look something like this:

plot_section_header <- function(...) {

# Create individual elements
p1 <- plot_level_bar(...)
p2 <- plot_icon_array(...)
p3 <- plot_level_bar(...)

# Some code to arrange the elements
p <- arrange_fun(p1, p2, p3)

return(p)

}

May 23 '24 14:05 ndphillips

One big question I do have is whether to transition to ggplot2 or stay in base-R.

I love ggplot2. In my personal and professional work, I have completely transitioned to it.

Let's think through the pros and cons of transitioning to ggplot2:

Pros

Flexible - ggplot2 is super flexible and extendable.
- Opens up the possibilities of introducing elements such as animations (https://gganimate.com/), themes (https://ggplot2.tidyverse.org/reference/ggtheme.html), and interactivity (https://plotly.com/ggplot2/)
- Allows users the option to take the output ggplot2 object and do something else with it

Cons

Time - Will basically require re-writing all plotting functionality from scratch
Dependencies - Will increase number of dependencies in package

May 23 '24 15:05 ndphillips

Thanks for the breakdown in modular components, which is very helpful. When re-creating the summary plot from these components, it would be much easier to create alternative versions or swap individual components.

For instance, I've thinking about replacing the icon arrays at the exits with 2x2 matrices that sum up either the classified cases at the current node (i.e., either the top or bottom row of a 2x2 matrix) OR all classified cases so far. Additionally, it would be helpful to visualize how NA values in cues are being handled (e.g., show their number at nodes and how they are classified eventually). Adding such functionality in a more modular framework may involve editing several functions, but seems far easier than fiddling with the complex monolithic function we had before.

With regards to ggplot2: I love and use this package as well, but see more immediate costs than benefits by transitioning to it. I we were to develop a general infrastructure for plotting trees and aspects of their elements, ggplot2 may be an option (especially if we wanted a wider community to contribute). But as FFTs have some special constraints and our present plotting options are still quite sophisticated I'd opt for a base R solution within FFTrees.

May 24 '24 08:05 hneth

Thanks Hans. I agree that, at least to start, sticking with base-R is a reasonable option. I'm working on this in #224

May 24 '24 13:05 ndphillips

Excellent! btw: I previously wrote a bunch of plotting macros (e.g., functions for drawing boxes and links between them) for the diagrams in the riskyr package (see plot_util.R). Some of these functions are too specific for FFTrees, but the more basic ones could be helpful when drawing trees.

May 24 '24 14:05 hneth

FFTrees
FFTrees copied to clipboard

plot.FFTrees() is not written in a modular way and should be refactored

Elements

Header

Main Plot

Footer

Elements

Sections

FFTrees FFTrees copied to clipboard

plot.FFTrees() is not written in a modular way and should be refactored

Elements

Header

Main Plot

Footer

Elements

Sections

FFTrees
FFTrees copied to clipboard