teal
teal copied to clipboard
[Research]: Universal way of specifying modules encoding
Feature description
We need an abstract shiny modules which can be universal for any data configuration. It's meant by this, that module has some standard visualization (for example table or plot) which can utilise any data input.
Please consider following example data kept in a list:
get_data <- function() {
parent <- data.frame(
id = letters[1:10],
age = sample(1:65, 10, replace = TRUE),
sex = sample(c("m", "f"), 10, replace = TRUE),
arm = sample(c("a", "b", "c"), 10, replace = TRUE),
value = runif(10)
)
child <- expand.grid(id = letters[1:10], group = LETTERS[1:3])
child$value <- runif(30)
child$value2 <- runif(30)
data <- list(parent = parent, child = child)
}
Consider a function which produces some output based on some data.frame and its columnames specified in x, y, z arguments.
foo_generic <- function(data, x, y, z) {
ggplot(data, aes(x = .data[[x]], y = .data[[y]], colour = .data[[z]])) +
geom_point()
}
foo_generic(iris, x = "Sepal.Length", y = "Sepal.Width", z = "Species")
Please imagine that foo_generic is executed by some other function and that you don't have access to the data object.
You are free to modify foo but data can't be an input of the function, and foo_generic must be called.
foo <- function(...) {
data <- get_data()
foo_generic
}
one dataset scenario
Let's consider simple scenario that we know which object from data list is needed and which variables are needed. We can simply create a foo1 as follows
foo1 <- function(dataname, x, y, z) {
data <- get_data()
foo_generic(data[[dataname]], x = x, y = y, z = z)
}
foo1(dataname = "child", x = "value", y = "value2", z = "group")
We can even make it more sophisticated by assigning data to environment where foo_generic will be evaluated
call_in_data <- function(expr) {
data <- get_data()
env <- new.env()
for (i in names(data)) {
env[[i]] <- data[[i]]
}
eval(expr, env)
}
foo2 <- function(dataname, x, y, z) {
expr <- substitute(
foo_generic(what, x, y, z),
list(
what = as.name(dataname),
x = x, y = y, z = z
)
)
print(expr)
call_in_data(expr)
}
foo2(dataname = "child", x = "value", y = "value2", z = "group")
multiple datasets scenario
Our use case includes situations when it's possible to select x, y, z from multiple datasets. Please consider that:
x = "value"fromchildy = "value2"fromchildz = "sex"fromparent
foo3 <- function(x, y, z) {
data <- get_data()
foo_generic(??whaaaat??, x = x, y = y, z = z)
}
??whaaat?? one of the most painful dilemmas we have. In this situation we used to
- select variables from appropriate datasets
- merge datasets into one data.frame
- build visualization on merged object with columns x, y, z.
Code looked like this.
data <- get_data()
child <- data$child |> select(id, value, value2)
parent <- data$parent |> select(id, sex)
data <- left_join(parent, child, by = "id")
foo_generic(data, x = "value", y = "value2", z = "sex")
In my opinion merge is inevitable and the remaining question is rather:
- where merge should happen
- who should be responsible for merge (function user or function developer)
With @pawelru we imagined couple solutions.
1. passing and resolving metadata
x, y and z could be represented as objects containing column name and data name. Combining x, y and z gives all informations about which columns and datasets are used, so the merge can be performed. In this situation merge call is generated inside off the function and we can view this as auto_merge based on some additional information.
- ✅ an advantage is that merge is automated and x, y, z are simple
- ❌ disadvantage of this solution is lack of control on merge
x <- c(dataname = "child", col = "value")
y <- c(dataname = "child", col = "value2")
z <- c(dataname = "parent", col = "sex")
make_merge_call <- function(x, y, z, join_keys) {
# lines below would be automatised on x, y, z and join_keys
list(
quote(child <- child[, c("id", "value", "value2")]),
quote(parent <- parent[, c("id", "sex")]),
quote(data <- dplyr::inner_join(child, parent), by = "id")
)
}
foo4 <- function(x, y, z) {
# hardcoded auto_merge ;)
calls <- make_merge_call(x, y, z, join_keys = "id")
calls <- c(
calls,
substitute(
foo_generic(data, x = x, y = y, z = z),
list(x = x[["col"]], y = y[["col"]], z = z[["col"]])
)
)
print(calls)
call_in_data(as.expression(calls))
}
foo4(x = x, y = y, z = z)
2. passing function to transform data and to get x, y, z
This solution shifts responsibility to merge on the function user. Instead of transforming (merging) data automatically, user will be responsible for this. Here user will be able to specify a function which will be evaluated in foo5.
- ✅ advantage of this is full control over data transformation
- ❌ disadvantage of this is pretty big burden on function user
transform_fun <- function(data) {
data_merged <- dplyr::left_join(
data$parent[, c("id", "sex")],
data$child[, c("id", "value", "value2")],
by = c("id" = "id"),
multiple = "all"
)
list(data = data_merged, x = "value", y = "value2", z = "sex")
}
foo5 <- function(data, fun) {
data <- get_data()
info <- fun(data)
foo_generic(info$data, x = info$x, y = info$y, z = info$z)
}
foo5(data, transform_fun)
3. Using dm
There is a interesting package which can control merge process automatically so we can combine solution 1 with dm to avoid our own auto_merge function.
library(dm)
x <- list(dataname = "child", col = "value")
y <- list(dataname = "child", col = "value2")
z <- list(dataname = "parent", col = "sex")
foo6 <- function(x, y, z) {
data <- new_dm(tables = get_data()) |>
dm_add_fk(table = "child", ref_table = "parent", columns = "id", ref_columns = "id") |>
dm_select(child, id, value, value2) |>
dm_select(parent, id, sex) |>
dm_select_tbl(child, parent) |>
dm_flatten_to_tbl(.start = child, .join = left_join)
foo_generic(data, x = "value", y = "value2", z = "sex")
}
foo6()
1. Exposing arguments to specify metadata
All inputs and possible data transformation will be hardcoded in the teal_module. UI could have select inputs to select variable and server could have a data validation and/or data transformation sub-module (or reactive).
| Advantages | Disadvantages |
| App developer will have to provide a metadata which is then consumed by the module to obtain column names. | limited possibility to set desired inputs configuration. |
| + Hardcoded shiny inputs lower risk for module to fail for example due to irrelevant input values. | not possible to change input type. |
| not possible to control merge process. |
In the code below teal_module is written in a way which allows app developer to specify values for x, y z.
#' @param x, y, z (`choices_selected`) object containing:
#' - `dataname` name of the dataset to choose from.
#' - `choices` names of the column choices in dataset.
#' - `selected` names of the selected columns in dataset.
module_ui <- function(id) {
ns <- NS(id)
fluidPage(
titlePanel("any table"),
sidebarLayout(
sidebarPanel(
tags$div(
selectInput(ns("x"), label = "select x variable", choices = NULL),
selectInput(ns("y"), label = "select y variable", choices = NULL),
selectInput(ns("z"), label = "select z variable", choices = NULL)
),
numericInput(ns("obs"), label = "number of observation", min = 1, max = 10, value = 5)
),
mainPanel(tableOutput(ns("table")))
)
)
}
module_srv <- function(id, data, x, y, z) {
moduleServer(id, function(input, output, session) {
# initialize inputs on init
isolate({
updateSelectInput(inputId = "x", choices = x$choices, selected = x$selected)
updateSelectInput(inputId = "y", choices = y$choices, selected = y$selected)
updateSelectInput(inputId = "z", choices = z$choices, selected = z$selected)
})
# merge data and select
data_out <- reactive({
datanames <- unique(c(x$dataname, y$dataname, z$dataname))
d <- if (length(datanames) == 1) {
data[[datanames]]
} else {
dplyr::left_join(data$parent, data$child, by = c(id = "parent_id"))
} |>
dplyr::select(c(input$x, input$y, input$z)) |>
head(input$obs)
})
output$table <- renderTable(data_out())
})
}
library(shiny)
parent <- data.frame(
id = letters[1:10],
age = sample(1:65, 10, replace = TRUE),
sex = sample(c("m", "f"), 10, replace = TRUE),
arm = sample(c("a", "b", "c"), 10, replace = TRUE)
)
child <- expand.grid(parent_id = letters[1:10], id = LETTERS[1:3])
child$value <- runif(30)
child$value2 <- runif(30)
data <- list(parent = parent, child = child)
shinyApp(
ui = fluidPage(module_ui("module")),
server = function(input, output, session) {
module_srv(
id = "module",
data = data,
x = list(dataname = "parent", choices = colnames(parent), selected = "age"),
y = list(dataname = "parent", choices = colnames(parent), selected = "sex"),
z = list(dataname = "child", choices = colnames(child), selected = "value")
)
}
)
2. Injecting data-transform modules into teal_module
Data transformation srv and ui module could be injected into teal_module. Such transformation module could be wrapped in a class object which could be passed as an argument of teal_module$srv and teal_module$ui. Module developer will have to create a default object which app developer will have to overwrite if custom setting are needed.
| Advantages | Disadvantages |
| App developer have an entire control over inputs. | Might not be easy for app developer to create valid object as its output will have to be exactly as `teal_module` needs |
| `teal_module` will have a limited code. Transformation and inputs will be moved away | Module developer will have to create as universal transformation module as possible. |
module_ui <- function(id, custom_ui = function(id) NULL) {
ns <- NS(id)
fluidPage(
titlePanel("any table"),
sidebarLayout(
sidebarPanel(
custom_ui(id),
numericInput(ns("obs"), label = "number of observation", min = 1, max = 10, value = 5)
),
mainPanel(tableOutput(ns("table")))
)
)
}
module_srv <- function(id, data, custom_srv = function(data) reactive(data())) {
moduleServer(id, function(input, output, session) {
# initialize inputs on init
data_out_custom <- custom_srv(data, input, output, session)
data_out <- reactive(data_out_custom() |> head(input$obs))
output$table <- renderTable(data_out())
})
}
App
library(shiny)
parent <- data.frame(
id = letters[1:10],
age = sample(1:65, 10, replace = TRUE),
sex = sample(c("m", "f"), 10, replace = TRUE),
arm = sample(c("a", "b", "c"), 10, replace = TRUE)
)
child <- expand.grid(parent_id = letters[1:10], id = LETTERS[1:3])
child$value <- runif(30)
child$value2 <- runif(30)
data <- list(parent = parent, child = child)
inputs <- list(
ui = function(id) {
ns <- NS(id)
tags$div(
selectInput(ns("x"), label = "select x variable", choices = c("age", "sex", "arm"), selected = "age"),
selectInput(ns("y"), label = "select y variable", choices = c("age", "sex", "arm"), selected = "sex"),
selectInput(ns("z"), label = "select z variable", choices = c("value", "value2"), selected = "arm")
)
},
server = function(data, input, output, session) {
reactive({
dplyr::left_join(data$parent, data$child, by = c(id = "parent_id")) |>
dplyr::select(input$x, input$y, input$z)
})
}
)
shinyApp(
ui = fluidPage(module_ui("module", custom_ui = inputs$ui)),
server = function(input, output, session) {
module_srv(id = "module", data = data, custom_srv = inputs$server)
}
)
3. Wrapping teal_module by data-transform module
Transformation module would be outside of the teal_module and could be executed by external process (teal). Such teal_module requires only resolved arguments without knowing anything about "parent" process. Such transformation layer would have to call teal_module$server and teal_module$ui.
| Advantages | Disadvantages |
| Module developer completely ignores creating and controlling inputs and expects correct data and inputs to be passed via arguments | `teal_module` have a very limited functionality outside `teal` |
I have questions:
- Is the
parent/childdichotomy a given or should we handle an arbitrary number of data sets within the one list? - Is the
idcolumn mandatory and can we trust that it is always sufficient for a merging operation? - What is the advantage of having an external auto-merge function over having our own?
- Should
foo5not befunction(fun)? It seemsdatais obtained fromget_dataduring execution.
foo5 <- function(data, fun) {
data <- get_data()
info <- fun(data)
foo_generic(info$data, x = info$x, y = info$y, z = info$z)
}
foo5(data, transform_fun)
My first thought is I do not like
some_fun(x = c(dataname = "child", col = "value"), y = c(dataname = "child", col = "value2"), z = c(dataname = "parent", col = "sex"))
I would rather have some_fun(x = child$value, y = child$value, z = parent$sex).
closing, it is not really an issue but thoughts