`ScalarInequality` and `ScalarRange` Can’t Use `datetime64` as datatype
Description
When applying the ScalarInequality or ScalarRange constraint on a datetime64 column in a pd.DataFrame, it raises an InvalidConstraintsError. The error message indicates that the 'value' must be a datetime string in the correct format, even when a valid datetime string is provided. This prevents using datetime64 columns directly with the ScalarInequality constraint.
Steps to reproduce
from sdv.metadata import SingleTableMetadata
from sdv.single_table import GaussianCopulaSynthesizer
import pandas as pd
import numpy as np
data = {
'object': pd.Series(['a', 'b', 'c'], dtype='object'),
'string': pd.Series(['a', 'b', 'c'], dtype='string'),
'datetime64': pd.Series(pd.date_range('2023-01-01', periods=3), dtype='datetime64[ns]'),
}
df = pd.DataFrame(data)
metadata = SingleTableMetadata()
metadata.add_column('object', sdtype='categorical')
metadata.add_column('string', sdtype='categorical')
metadata.add_column('datetime64', sdtype='datetime')
gcs = GaussianCopulaSynthesizer(metadata)
my_constraints = [
{
'constraint_class': 'ScalarInequality',
'constraint_parameters': {
'column_name': 'datetime64',
'relation': '>=',
'value': '2023-01-01'
}
}
]
gcs.add_constraints(my_constraints)
File ~/Projects/sdv-dev/SDV/sdv/single_table/base.py:296, in BaseSynthesizer.add_constraints(self, constraints)
291 if self._fitted:
292 warnings.warn(
293 "For these constraints to take effect, please refit the synthesizer using 'fit'."
294 )
--> 296 self._data_processor.add_constraints(constraints)
File ~/Projects/sdv-dev/SDV/sdv/data_processing/data_processor.py:327, in DataProcessor.add_constraints(self, constraints)
324 errors.append(reformated_errors)
326 if errors:
--> 327 raise InvalidConstraintsError(errors)
329 self._constraints_list.extend(validated_constraints)
330 self._prepared_for_fitting = False
InvalidConstraintsError: The provided constraint is invalid:
'value' must be a datetime string of the right format.
This issue is still not resolved. I am also getting the same issue . [{'table_name': 'XXX', 'constraint_class': 'ScalarRange', 'constraint_parameters': {'column_name': 'CreatedDate', 'low_value': '2007-10-01 00:00:00.000', 'high_value': '2009-10-01 00:00:00.000'}}] Error after validating :The provided constraint is invalid: Both 'high_value' and 'low_value' must be a datetime string of the right format
Hi @CShah-CitiusTech, whenever an issue is resolved and available in a release, it will appear as "closed" in the GitHub. This one is still labeled as "open", so it seems the team has not had a chance to look at this yet.
I'm curious what is the intended outcome you are hoping to achieve by adding a ScalarRange or ScalarInequality constraint on your datasets? There are other ways to enforce the min/max values from your data -- so these constraints are a bit redundant from that regard.
If your intended usage is to expand the allowable ranges in the synthetic data (aka "out-of-range sampling"), that is not something these constraints are designed to do. I noticed that you opened issue #2469. I've replied there with some additional comments about out-of-range sampling. Thanks.