ggquickeda icon indicating copy to clipboard operation
ggquickeda copied to clipboard

[Feature request] Yeo-Johnson transformation for numeric variables

Open lucazav opened this issue 4 years ago • 3 comments

It'd be really useful to be able to apply the Yeo-Johnson transformation to numeric variables instead of only the "log10" one. In this way you can manage also left skewed distributions.

lucazav avatar Aug 21 '21 09:08 lucazav

This one could be the function to implement:

library(dplyr)

yeo_johnson_transf <- function(data) {
  require(recipes)
  
  rec <- recipe(data, as.formula(' ~ .'))
  
  rec <- rec %>%
    step_center( all_numeric() ) %>%
    step_scale( all_numeric() ) %>%
    step_YeoJohnson( all_numeric() )
  
  prep_rec <- prep( rec, training = data )
  
  res_list <- list( df_yeojohnson = bake( prep_rec, data ),
                    lambdas = prep_rec$steps[[3]][["lambdas"]] )
}


yeo_johnson_list <- iris %>% 
  yeo_johnson_transf()

transf_iris <- yeo_johnson_list$df_yeojohnson
transf_iris

lambdas_iris <- yeo_johnson_list$lambdas
lambdas_iris

lucazav avatar Aug 21 '21 10:08 lucazav

Thanks for your interest in ggquickeda, it seems a useful feature, do you have an idea on where in the workflow you would like me to implement it ? as a transformation of the variable itself or of the ggplot scale ? I think I have a menu where I allow dividing a numeric variable by a constant or another column I can fit it in there.

smouksassi avatar Aug 23 '21 09:08 smouksassi

I think it'd be great to have it as a transformation of the variable itself.

lucazav avatar Aug 23 '21 09:08 lucazav