DataExplorer icon indicating copy to clipboard operation
DataExplorer copied to clipboard

Group-wise color in scatterplot

Open araikes opened this issue 6 years ago • 4 comments

Hello, First, thank you for your awesome product. I really appreciate the data exploration tool you've put together.

I'm trying to figure out if certain functionality exists. I'm running DataExplorer v 0.8.0.

I have the following dataframe:

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	32 obs. of  7 variables:
 $ group          : Factor w/ 2 levels "amber","blue": 1 2 2 1 2 1 1 2 2 2 ...
 $ ess_score      : num  13 14 9 12 1 3 11 8 5 8 ...
 $ rpcsq_rpq3     : num  2 4 2 3 5 0 6 4 0 0 ...
 $ rpcsq_rpq13    : num  4 16 19 2 16 4 15 23 12 16 ...
 $ rpcsq_cognitive: num  0 4 8 0 6 2 2 10 6 7 ...
 $ rpcsq_somatic  : num  6 14 13 3 15 2 12 10 6 6 ...
 $ rpcsq_emotional: num  0 2 0 2 0 0 7 7 0 3 ...

I'd like to produce ess x rpcsq scatterplots (5 scatterplots) with the points colored by group. I've tried the following:

> plot_scatterplot(tmp, by = "ess_score")

This works but obviously doesn't color the points. The following code however fails to produce the plots:

> plot_scatterplot(tmp, by = "ess_score", geom_point_args = list(col = "group"))
Error in grDevices::col2rgb(colour, TRUE) : invalid color name 'group'
> plot_scatterplot(tmp, by = "ess_score", geom_point_args = list(col = group))
Error in do.call("geom_point", geom_point_args) : 
  object 'group' not found
> plot_scatterplot(tmp, by = "ess_score", geom_point_args = list(group = "group", col = "group"))
Error in grDevices::col2rgb(colour, TRUE) : invalid color name 'group'

Does the functionality I'm looking for exist in the current iteration of DataExplorer? Thanks for any help you can be.

araikes avatar Apr 19 '19 17:04 araikes

Thanks for using DataExplorer. For your need, you will have to tweak the source code a little. Copy & paste the following function and you should be able to pass group.

plot_scatterplot2 <- function(data, by, group, sampled_rows = nrow(data), geom_point_args = list(), title = NULL, ggtheme = theme_gray(), theme_config = list(), nrow = 3L, ncol = 3L, parallel = FALSE) {
  variable <- NULL
  if (!is.data.table(data)) data <- data.table(data)
  if (sampled_rows < nrow(data)) data <- data[sample.int(nrow(data), sampled_rows)]
  dt <- suppressWarnings(melt.data.table(data, id.vars = c(by, group), variable.factor = FALSE))
  feature_names <- unique(dt[["variable"]])
  layout <- DataExplorer:::.getPageLayout(nrow, ncol, length(feature_names))
  plot_list <- DataExplorer:::.lapply(
    parallel = parallel,
    X = layout,
    FUN = function(x) {
      ggplot(dt[variable %in% feature_names[x]], aes_string(x = by, y = "value", color = group)) +
        do.call("geom_point", geom_point_args) +
        coord_flip() +
        xlab(by)
    }
  )
  class(plot_list) <- c("multiple", class(plot_list))
  plotDataExplorer(
    plot_obj = plot_list,
    page_layout = layout,
    title = title,
    ggtheme = ggtheme,
    theme_config = theme_config,
    facet_wrap_args = list(
      "facet" = ~ variable,
      "nrow" = nrow,
      "ncol" = ncol,
      "scales" = "free_x",
      "shrink" = FALSE
    )
  )
}

Then just do:

plot_scatterplot2(tmp, by = "ess_score", group = "group")

I tested it on iris and it works fine:

plot_scatterplot2(iris, by = "Sepal.Length", group = "Species")

boxuancui avatar Apr 19 '19 18:04 boxuancui

Please also keep this issue open. I might be able to add this in future versions, but I can't promise which one.

boxuancui avatar Apr 19 '19 18:04 boxuancui

Thanks @boxuancui. I'll give it a shot.

araikes avatar Apr 19 '19 20:04 araikes

In general, having the ability to color (or assign other ggplot2 aesthetics) based on some groups defined in some column would be quite useful for many of the plotting functions (plot_density, etc.)

khughitt avatar Jun 01 '19 22:06 khughitt