Feature description

We need an abstract shiny modules which can be universal for any data configuration. It's meant by this, that module has some standard visualization (for example table or plot) which can utilise any data input.

Please consider following example data kept in a list:

get_data <- function() {
  parent <- data.frame(
    id = letters[1:10],
    age = sample(1:65, 10, replace = TRUE),
    sex = sample(c("m", "f"), 10, replace = TRUE),
    arm = sample(c("a", "b", "c"), 10, replace = TRUE),
    value = runif(10)
  )
  child <- expand.grid(id = letters[1:10], group = LETTERS[1:3])
  child$value <- runif(30)
  child$value2 <- runif(30)
  data <- list(parent = parent, child = child)
}

Consider a function which produces some output based on some data.frame and its columnames specified in x, y, z arguments.

foo_generic <- function(data, x, y, z) {
  ggplot(data, aes(x = .data[[x]], y = .data[[y]], colour = .data[[z]])) +
    geom_point()
}

foo_generic(iris, x = "Sepal.Length", y = "Sepal.Width", z = "Species")

Please imagine that foo_generic is executed by some other function and that you don't have access to the data object. You are free to modify foo but data can't be an input of the function, and foo_generic must be called.

foo <- function(...) {
  data <- get_data()
  foo_generic
}

one dataset scenario

Let's consider simple scenario that we know which object from data list is needed and which variables are needed. We can simply create a foo1 as follows

foo1 <- function(dataname, x, y, z) {
  data <- get_data()
  foo_generic(data[[dataname]], x = x, y = y, z = z)
}
foo1(dataname = "child", x = "value", y = "value2", z = "group")

We can even make it more sophisticated by assigning data to environment where foo_generic will be evaluated

call_in_data <- function(expr) {
  data <- get_data()
  env <- new.env()
  for (i in names(data)) {
    env[[i]] <- data[[i]]
  }
  eval(expr, env)
}
foo2 <- function(dataname, x, y, z) {
  expr <- substitute(
    foo_generic(what, x, y, z),
    list(
      what = as.name(dataname),
      x = x, y = y, z = z
    )
  )
  print(expr)
  call_in_data(expr)
}
foo2(dataname = "child", x = "value", y = "value2", z = "group")

multiple datasets scenario

Our use case includes situations when it's possible to select x, y, z from multiple datasets. Please consider that:

x = "value" from child
y = "value2" from child
z = "sex" from parent

foo3 <- function(x, y, z) {
  data <- get_data()
  foo_generic(??whaaaat??, x = x, y = y, z = z)
}

??whaaat?? one of the most painful dilemmas we have. In this situation we used to

select variables from appropriate datasets
merge datasets into one data.frame
build visualization on merged object with columns x, y, z.

Code looked like this.

data <- get_data()
child <- data$child |> select(id, value, value2)
parent <- data$parent |> select(id, sex)
data <- left_join(parent, child, by = "id")
foo_generic(data, x = "value", y = "value2", z = "sex")

In my opinion merge is inevitable and the remaining question is rather:

where merge should happen
who should be responsible for merge (function user or function developer)

With @pawelru we imagined couple solutions.

1. passing and resolving metadata

x, y and z could be represented as objects containing column name and data name. Combining x, y and z gives all informations about which columns and datasets are used, so the merge can be performed. In this situation merge call is generated inside off the function and we can view this as auto_merge based on some additional information.

✅ an advantage is that merge is automated and x, y, z are simple
❌ disadvantage of this solution is lack of control on merge

x <- c(dataname = "child", col = "value")
y <- c(dataname = "child", col = "value2")
z <- c(dataname = "parent", col = "sex")

make_merge_call <- function(x, y, z, join_keys) {
  # lines below would be automatised on x, y, z and join_keys
  list(
    quote(child <- child[, c("id", "value", "value2")]),
    quote(parent <- parent[, c("id", "sex")]),
    quote(data <- dplyr::inner_join(child, parent), by = "id")
  )
}

foo4 <- function(x, y, z) {
  # hardcoded auto_merge ;)
  calls <- make_merge_call(x, y, z, join_keys = "id")
  calls <- c(
    calls,
    substitute(
      foo_generic(data, x = x, y = y, z = z),
      list(x = x[["col"]], y = y[["col"]], z = z[["col"]])
    )
  )
  print(calls)
  call_in_data(as.expression(calls))
}
foo4(x = x, y = y, z = z)

2. passing function to transform data and to get x, y, z

This solution shifts responsibility to merge on the function user. Instead of transforming (merging) data automatically, user will be responsible for this. Here user will be able to specify a function which will be evaluated in foo5.

✅ advantage of this is full control over data transformation
❌ disadvantage of this is pretty big burden on function user

transform_fun <- function(data) {
  data_merged <- dplyr::left_join(
    data$parent[, c("id", "sex")],
    data$child[, c("id", "value", "value2")],
    by = c("id" = "id"),
    multiple = "all"
  )
  list(data = data_merged, x = "value", y = "value2", z = "sex")
}

foo5 <- function(data, fun) {
  data <- get_data()
  info <- fun(data)
  foo_generic(info$data, x = info$x, y = info$y, z = info$z)
}

foo5(data, transform_fun)

3. Using `dm`

There is a interesting package which can control merge process automatically so we can combine solution 1 with dm to avoid our own auto_merge function.

library(dm)
x <- list(dataname = "child", col = "value")
y <- list(dataname = "child", col = "value2")
z <- list(dataname = "parent", col = "sex")
foo6 <- function(x, y, z) {
  data <- new_dm(tables = get_data()) |>
    dm_add_fk(table = "child", ref_table = "parent", columns = "id", ref_columns = "id") |>
    dm_select(child, id, value, value2) |>
    dm_select(parent, id, sex) |>
    dm_select_tbl(child, parent) |>
    dm_flatten_to_tbl(.start = child, .join = left_join)

  foo_generic(data, x = "value", y = "value2", z = "sex")
}

foo6()

Jun 12 '23 17:06 gogonzo

1. Exposing arguments to specify metadata

Untitled Diagram-model-1 drawio

All inputs and possible data transformation will be hardcoded in the teal_module. UI could have select inputs to select variable and server could have a data validation and/or data transformation sub-module (or reactive).

Advantages	Disadvantages
App developer will have to provide a metadata which is then consumed by the module to obtain column names.	limited possibility to set desired inputs configuration.
+ Hardcoded shiny inputs lower risk for module to fail for example due to irrelevant input values.	not possible to change input type.
	not possible to control merge process.

In the code below teal_module is written in a way which allows app developer to specify values for x, y z.

#' @param x, y, z (`choices_selected`) object containing:
#' - `dataname` name of the dataset to choose from.
#' - `choices` names of the column choices in dataset.
#' - `selected` names of the selected columns in dataset.
module_ui <- function(id) {
  ns <- NS(id)
  fluidPage(
    titlePanel("any table"),
    sidebarLayout(
      sidebarPanel(
        tags$div(
          selectInput(ns("x"), label = "select x variable", choices = NULL),
          selectInput(ns("y"), label = "select y variable", choices = NULL),
          selectInput(ns("z"), label = "select z variable", choices = NULL)
        ),
        numericInput(ns("obs"), label = "number of observation", min = 1, max = 10, value = 5)
      ),
      mainPanel(tableOutput(ns("table")))
    )
  )
}

module_srv <- function(id, data, x, y, z) {
  moduleServer(id, function(input, output, session) {
    # initialize inputs on init
    isolate({
      updateSelectInput(inputId = "x", choices = x$choices, selected = x$selected)
      updateSelectInput(inputId = "y", choices = y$choices, selected = y$selected)
      updateSelectInput(inputId = "z", choices = z$choices, selected = z$selected)
    })

    # merge data and select
    data_out <- reactive({
      datanames <- unique(c(x$dataname, y$dataname, z$dataname))
      d <- if (length(datanames) == 1) {
        data[[datanames]]
      } else {
        dplyr::left_join(data$parent, data$child, by = c(id = "parent_id"))
      } |>
        dplyr::select(c(input$x, input$y, input$z)) |>
        head(input$obs)
    })

    output$table <- renderTable(data_out())
  })
}

library(shiny)

parent <- data.frame(
  id = letters[1:10],
  age = sample(1:65, 10, replace = TRUE),
  sex = sample(c("m", "f"), 10, replace = TRUE),
  arm = sample(c("a", "b", "c"), 10, replace = TRUE)
)
child <- expand.grid(parent_id = letters[1:10], id = LETTERS[1:3])
child$value <- runif(30)
child$value2 <- runif(30)

data <- list(parent = parent, child = child)

shinyApp(
  ui = fluidPage(module_ui("module")),
  server = function(input, output, session) {
    module_srv(
      id = "module",
      data = data,
      x = list(dataname = "parent", choices = colnames(parent), selected = "age"),
      y = list(dataname = "parent", choices = colnames(parent), selected = "sex"),
      z = list(dataname = "child", choices = colnames(child), selected = "value")
    )
  }
)

Jun 21 '23 21:06 gogonzo

2. Injecting data-transform modules into teal_module

Untitled Diagram-model-2 drawio

Data transformation srv and ui module could be injected into teal_module. Such transformation module could be wrapped in a class object which could be passed as an argument of teal_module$srv and teal_module$ui. Module developer will have to create a default object which app developer will have to overwrite if custom setting are needed.

Advantages	Disadvantages
App developer have an entire control over inputs.	Might not be easy for app developer to create valid object as its output will have to be exactly as `teal_module` needs
`teal_module` will have a limited code. Transformation and inputs will be moved away	Module developer will have to create as universal transformation module as possible.

module_ui <- function(id, custom_ui = function(id) NULL) {
  ns <- NS(id)
  fluidPage(
    titlePanel("any table"),
    sidebarLayout(
      sidebarPanel(
        custom_ui(id),
        numericInput(ns("obs"), label = "number of observation", min = 1, max = 10, value = 5)
      ),
      mainPanel(tableOutput(ns("table")))
    )
  )
}

module_srv <- function(id, data, custom_srv = function(data) reactive(data())) {
  moduleServer(id, function(input, output, session) {
    # initialize inputs on init
    data_out_custom <- custom_srv(data, input, output, session)
    data_out <- reactive(data_out_custom() |> head(input$obs))
    output$table <- renderTable(data_out())
  })
}

App

library(shiny)

parent <- data.frame(
  id = letters[1:10],
  age = sample(1:65, 10, replace = TRUE),
  sex = sample(c("m", "f"), 10, replace = TRUE),
  arm = sample(c("a", "b", "c"), 10, replace = TRUE)
)
child <- expand.grid(parent_id = letters[1:10], id = LETTERS[1:3])
child$value <- runif(30)
child$value2 <- runif(30)

data <- list(parent = parent, child = child)

inputs <- list(
  ui = function(id) {
    ns <- NS(id)
    tags$div(
      selectInput(ns("x"), label = "select x variable", choices = c("age", "sex", "arm"), selected = "age"),
      selectInput(ns("y"), label = "select y variable", choices = c("age", "sex", "arm"), selected = "sex"),
      selectInput(ns("z"), label = "select z variable", choices = c("value", "value2"), selected = "arm")
    )
  },
  server = function(data, input, output, session) {
    reactive({
      dplyr::left_join(data$parent, data$child, by = c(id = "parent_id")) |>
        dplyr::select(input$x, input$y, input$z)
    })
  }
)

shinyApp(
  ui = fluidPage(module_ui("module", custom_ui = inputs$ui)),
  server = function(input, output, session) {
    module_srv(id = "module", data = data, custom_srv = inputs$server)
  }
)

Jun 21 '23 21:06 gogonzo

3. Wrapping `teal_module` by data-transform module

Untitled Diagram-model-3 drawio

Transformation module would be outside of the teal_module and could be executed by external process (teal). Such teal_module requires only resolved arguments without knowing anything about "parent" process. Such transformation layer would have to call teal_module$server and teal_module$ui.

Advantages	Disadvantages
Module developer completely ignores creating and controlling inputs and expects correct data and inputs to be passed via arguments	`teal_module` have a very limited functionality outside `teal`

Jun 21 '23 21:06 gogonzo

I have questions:

Is the parent/child dichotomy a given or should we handle an arbitrary number of data sets within the one list?
Is the id column mandatory and can we trust that it is always sufficient for a merging operation?
What is the advantage of having an external auto-merge function over having our own?
Should foo5 not be function(fun)? It seems data is obtained from get_data during execution.

foo5 <- function(data, fun) {
  data <- get_data()
  info <- fun(data)
  foo_generic(info$data, x = info$x, y = info$y, z = info$z)
}

foo5(data, transform_fun)

My first thought is I do not like some_fun(x = c(dataname = "child", col = "value"), y = c(dataname = "child", col = "value2"), z = c(dataname = "parent", col = "sex")) I would rather have some_fun(x = child$value, y = child$value, z = parent$sex).

Aug 07 '23 11:08 chlebowa

closing, it is not really an issue but thoughts

May 24 '24 07:05 gogonzo

teal
teal copied to clipboard

[Research]: Universal way of specifying modules encoding

Feature description

one dataset scenario

multiple datasets scenario

1. passing and resolving metadata

2. passing function to transform data and to get x, y, z

3. Using `dm`

1. Exposing arguments to specify metadata

2. Injecting data-transform modules into teal_module

3. Wrapping `teal_module` by data-transform module

teal teal copied to clipboard

[Research]: Universal way of specifying modules encoding

Feature description

one dataset scenario

multiple datasets scenario

1. passing and resolving metadata

2. passing function to transform data and to get x, y, z

3. Using dm

1. Exposing arguments to specify metadata

2. Injecting data-transform modules into teal_module

3. Wrapping teal_module by data-transform module

teal
teal copied to clipboard

3. Using `dm`

3. Wrapping `teal_module` by data-transform module