Add reproducibility when fitting a synthesizer
Problem Description
I want to improve my ability to evaluate synthesizers with different parameters, in different environments, and against each other.
Expected behavior
As a user, I'd like the Synthesizer models to be fit in the same way so I can generate the same synthetic data every time.
Potential API
There are situations when you want a slightly different model to be trained. So reproducibility may be something we try to incorporate with a parameter:
synthesizer.fit(original_data, random_state=1)
Additional context
Originally raised here: https://github.com/sdv-dev/CTGAN/issues/380#issuecomment-2109042846
Potential Workaround
A potential workaround for controlling the seed state during fit() was mentioned in this comment. We've copied it here for easy reference:
import numpy as np
import torch
SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)
While we don't guarantee this will work, we hope some people might find it useful!
This would be really great! I hope it gets implemented.
I second this. It would be very helpful to me