sas-studio-custom-steps
sas-studio-custom-steps copied to clipboard
Feature Request - Imputation Using GAN
Synthetic Data Generation can be used for more than just generating completely new samples of data. One other use case that I'd like to have is to use it for imputing missing values, given some fields which are already populated.
Current SDG - Generate Synthetic Data through GANs is configured to only generate completely new rows of data. It would be great if we can extend this to have it accept an additional input table [the table containing missing values] and run that through a pre-trained GAN to help with imputation.
@JinnyBoy94 , thank you for initiating this discussion. I’m the contributor for this custom step. Happy to think this over and get back with a plan.
To provide some additional context, we are dealing here with a procedure which uses a specific implementation (the CPCTGAN) so dealing with some closed (or you might say bounded) architecture patterns here.
Initial thoughts are to look at imputation as an alternative application and step in itself (which can be addressed by more than one tool). Or, to see if the current step acquires a new option. We’ll continue this conversation. Thank you.
@JinnyBoy94, if you happen to be a SAS employee, please let us know too. For ease of communication.
Thanks for responding @SundareshSankaran. My initial thought from this documentation was that the data generation of completely new rows happens when you run a PROC ASTORE of the trained GAN on a dataset with just RowID (or similar).
I figured that if the input table has fields that match the GAN output, the scoring process will impute (or more specifically, generate new values) the missing values. Admittedly, I didn't look into the code which backs this custom step so I might be misunderstanding the nuances of implementing it within the custom step here.