RDT icon indicating copy to clipboard operation
RDT copied to clipboard

Composable transformers or allow for one-hotting of CBN components?

Open HarrisonWilde opened this issue 2 years ago • 0 comments

Problem Description

Specifically, my question stems from wanting to one hot encode the component part of the result of applying the ClusterBasedNormalizer? My custom generator doesn't deal well with non-one-hotted categoricals and I'd like to change the HyperTransformer / underlying one in an SDV Synthesizer as little as possible? Currently I am defining a second HyperTransformer with OneHotEncoders for all the columns that use ClusterBasedNormalizer and None for all other columns but this is a little tedious / feels suboptimal as a workflow. Looking at the code for the ClusterBasedNormalizer it could make sense to expose a choice to the user as to how components is encoded? Alternatively, making transformers composable through some framework could be a nice addition to the package.

Expected behavior

One of:

  • cbn = ClusterBasedNormalizer(..., onehot = True, ...)
  • ht = HyperTransformer(); ht.update_transformers({"colname": [ClusterBasedNormalizer(...), OneHotEncoder()]}) o.e.

HarrisonWilde avatar Apr 11 '23 08:04 HarrisonWilde