SDV icon indicating copy to clipboard operation
SDV copied to clipboard

Add reproducibility when fitting a synthesizer

Open srinify opened this issue 1 year ago • 3 comments

Problem Description

I want to improve my ability to evaluate synthesizers with different parameters, in different environments, and against each other.

Expected behavior

As a user, I'd like the Synthesizer models to be fit in the same way so I can generate the same synthetic data every time.

Potential API

There are situations when you want a slightly different model to be trained. So reproducibility may be something we try to incorporate with a parameter:

synthesizer.fit(original_data, random_state=1)

Additional context

Originally raised here: https://github.com/sdv-dev/CTGAN/issues/380#issuecomment-2109042846

srinify avatar May 21 '24 13:05 srinify

Potential Workaround

A potential workaround for controlling the seed state during fit() was mentioned in this comment. We've copied it here for easy reference:

import numpy as np
import torch

SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)

While we don't guarantee this will work, we hope some people might find it useful!

srinify avatar Mar 17 '25 13:03 srinify

This would be really great! I hope it gets implemented.

FerdinandoR avatar May 18 '25 16:05 FerdinandoR

I second this. It would be very helpful to me

jcatlos avatar Aug 28 '25 11:08 jcatlos