SDV icon indicating copy to clipboard operation
SDV copied to clipboard

When inappropriately applying ScalarRange constraint, InvalidDataError is being returned instead of ConstraintsNotMetError

Open srinify opened this issue 1 year ago • 0 comments

Environment Details

  • SDV version: 0.10.0
  • Python version: 3.11.x

Error Description

If you have data outside of a range of values but you try to apply a ScalarRange constraint anyway, the ConstraintsNotMetError (link) should be thrown but instead the InvalidDataError is being thrown.

Originally identified here: https://github.com/sdv-dev/SDV/issues/1833

Steps to reproduce

Quick code snippet to reproduce in sdv 0.10:

from sdv.datasets.demo import get_available_demos, download_demo
from sdv.single_table import CTGANSynthesizer

demos_df = get_available_demos(modality='single_table')
data, metadata_obj = download_demo('single_table', 'census_extended')

constraint = {
    'constraint_class': 'ScalarRange',
    'constraint_parameters': {
        'column_name': 'age',
        'low_value': 5,
        'high_value': 10,
        'strict_boundaries': True
    }
}

synthesizer = CTGANSynthesizer(metadata_obj, epochs=500, verbose=True)
synthesizer.add_constraints(constraints=[constraint])
synthesizer.fit(data)

This is the resulting error:

---------------------------------------------------------------------------
InvalidDataError                          Traceback (most recent call last)
[<ipython-input-4-463f9f3b4286>](https://localhost:8080/#) in <cell line: 16>()
     14 synthesizer = CTGANSynthesizer(metadata_obj, epochs=500, verbose=True)
     15 synthesizer.add_constraints(constraints=[constraint])
---> 16 synthesizer.fit(data)

2 frames
[/usr/local/lib/python3.10/dist-packages/sdv/single_table/base.py](https://localhost:8080/#) in fit(self, data)
    393         self._data_processor.reset_sampling()
    394         self._random_state_set = False
--> 395         processed_data = self._preprocess(data)
    396         self.fit_processed_data(processed_data)
    397 

[/usr/local/lib/python3.10/dist-packages/sdv/single_table/ctgan.py](https://localhost:8080/#) in _preprocess(self, data)
    211 
    212     def _preprocess(self, data):
--> 213         self.validate(data)
    214         self._data_processor.fit(data)
    215         self._print_warning(data)

[/usr/local/lib/python3.10/dist-packages/sdv/single_table/base.py](https://localhost:8080/#) in validate(self, data)
    162 
    163         if errors:
--> 164             raise InvalidDataError(errors)
    165 
    166     def _validate_transformers(self, column_name_to_transformer):

InvalidDataError: The provided data does not match the metadata:

Data is not valid for the 'ScalarRange' constraint:
   age
0   39
1   50
2   38
3   53
4   28
+32556 more

srinify avatar Mar 06 '24 16:03 srinify