SDV
SDV copied to clipboard
`Unique` constraint is not respected when using a small `batch_size` parameter
Environment Details
- SDV version: 1.0.0b0 (beta version)
- Python version: 3.8
- Operating System: Linux (Google Colab)
Error Description
If I create a synthesizer with a Unique constraint and then I try to sample from it using a batch_size < num_rows, then the constraint is not followed. In this case, I see multiple rows with the same values, even though I specified they should be unique.
Steps to reproduce
from sdv.datasets.demo import download_demo
from sdv.single_table import GaussianCopulaSynthesizer
data, metadata = download_demo(
modality='single_table',
dataset_name='fake_companies'
)
# create a unique index column
data['index'] = [i for i in range(len(data))]
metadata.add_column(
column_name='index',
sdtype='numerical',
)
synth = GaussianCopulaSynthesizer(metadata)
synth.add_constraints(constraints=[{
'constraint_class': 'Unique',
'constraint_parameters': {
'column_names': ['index']
}
}])
synth.fit(data)
synth.sample(10, batch_size=2)
Observe that there are repeated values in the index column, even though I have specified they should be unique.