SDV icon indicating copy to clipboard operation
SDV copied to clipboard

`ScalarInequality` and `ScalarRange` Can’t Use `datetime64` as datatype

Open pvk-developer opened this issue 1 year ago • 2 comments

Description

When applying the ScalarInequality or ScalarRange constraint on a datetime64 column in a pd.DataFrame, it raises an InvalidConstraintsError. The error message indicates that the 'value' must be a datetime string in the correct format, even when a valid datetime string is provided. This prevents using datetime64 columns directly with the ScalarInequality constraint.

Steps to reproduce

from sdv.metadata import SingleTableMetadata
from sdv.single_table import GaussianCopulaSynthesizer

import pandas as pd
import numpy as np

data = {
    'object': pd.Series(['a', 'b', 'c'], dtype='object'),
    'string': pd.Series(['a', 'b', 'c'], dtype='string'),
    'datetime64': pd.Series(pd.date_range('2023-01-01', periods=3), dtype='datetime64[ns]'),
}
df = pd.DataFrame(data)

metadata = SingleTableMetadata()

metadata.add_column('object', sdtype='categorical')
metadata.add_column('string', sdtype='categorical')
metadata.add_column('datetime64', sdtype='datetime')

gcs = GaussianCopulaSynthesizer(metadata)

my_constraints = [
    {
        'constraint_class': 'ScalarInequality',
        'constraint_parameters': {
            'column_name': 'datetime64',
            'relation': '>=',
            'value':  '2023-01-01'
            }
    }
]

gcs.add_constraints(my_constraints)

File ~/Projects/sdv-dev/SDV/sdv/single_table/base.py:296, in BaseSynthesizer.add_constraints(self, constraints)
    291 if self._fitted:
    292     warnings.warn(
    293         "For these constraints to take effect, please refit the synthesizer using 'fit'."
    294     )
--> 296 self._data_processor.add_constraints(constraints)

File ~/Projects/sdv-dev/SDV/sdv/data_processing/data_processor.py:327, in DataProcessor.add_constraints(self, constraints)
    324         errors.append(reformated_errors)
    326 if errors:
--> 327     raise InvalidConstraintsError(errors)
    329 self._constraints_list.extend(validated_constraints)
    330 self._prepared_for_fitting = False

InvalidConstraintsError: The provided constraint is invalid:
'value' must be a datetime string of the right format.

pvk-developer avatar Sep 10 '24 15:09 pvk-developer

This issue is still not resolved. I am also getting the same issue . [{'table_name': 'XXX', 'constraint_class': 'ScalarRange', 'constraint_parameters': {'column_name': 'CreatedDate', 'low_value': '2007-10-01 00:00:00.000', 'high_value': '2009-10-01 00:00:00.000'}}] Error after validating :The provided constraint is invalid: Both 'high_value' and 'low_value' must be a datetime string of the right format

CShah-CitiusTech avatar Apr 17 '25 10:04 CShah-CitiusTech

Hi @CShah-CitiusTech, whenever an issue is resolved and available in a release, it will appear as "closed" in the GitHub. This one is still labeled as "open", so it seems the team has not had a chance to look at this yet.

I'm curious what is the intended outcome you are hoping to achieve by adding a ScalarRange or ScalarInequality constraint on your datasets? There are other ways to enforce the min/max values from your data -- so these constraints are a bit redundant from that regard.

If your intended usage is to expand the allowable ranges in the synthetic data (aka "out-of-range sampling"), that is not something these constraints are designed to do. I noticed that you opened issue #2469. I've replied there with some additional comments about out-of-range sampling. Thanks.

npatki avatar Apr 17 '25 14:04 npatki