CTGAN icon indicating copy to clipboard operation
CTGAN copied to clipboard

Avoid generating the conditional column

Open saart opened this issue 1 year ago • 0 comments

Environment details

  • CTGAN version: 0.7.1 (latest)
  • Python version: 3.10.11
  • Operating System: Mac/Unix

Problem description

I want to generate data conditionally, but I don't want to include the conditioned column in the output of the generator.

What I already tried

Currently, I just trim this column from the output. Intuitively, it creates a big waste everywhere: the network is bigger (thus slower), and the model size is bigger.

Example:

Data that holds two columns: hospital name and patient's age. Let's assume that there are 100 different hospitals, and my sole use of the generative model is to generate new rows for a given hospital. Currently, the model will create 101 input features: 100 one-hot features (for hospital names) and one continuous feature (for age).

saart avatar May 08 '23 22:05 saart