SDV icon indicating copy to clipboard operation
SDV copied to clipboard

`Unique` constraint is not respected when using a small `batch_size` parameter

Open npatki opened this issue 3 years ago • 0 comments

Environment Details

  • SDV version: 1.0.0b0 (beta version)
  • Python version: 3.8
  • Operating System: Linux (Google Colab)

Error Description

If I create a synthesizer with a Unique constraint and then I try to sample from it using a batch_size < num_rows, then the constraint is not followed. In this case, I see multiple rows with the same values, even though I specified they should be unique.

Steps to reproduce

from sdv.datasets.demo import download_demo
from sdv.single_table import GaussianCopulaSynthesizer

data, metadata = download_demo(
    modality='single_table',
    dataset_name='fake_companies'
)

# create a unique index column
data['index'] = [i for i in range(len(data))]
metadata.add_column(
    column_name='index',
    sdtype='numerical',
)

synth = GaussianCopulaSynthesizer(metadata)
synth.add_constraints(constraints=[{
    'constraint_class': 'Unique',
    'constraint_parameters': {
        'column_names': ['index']
    }
}])

synth.fit(data)
synth.sample(10, batch_size=2)

Observe that there are repeated values in the index column, even though I have specified they should be unique.

npatki avatar Feb 27 '23 19:02 npatki