SDV
SDV copied to clipboard
When inappropriately applying ScalarRange constraint, InvalidDataError is being returned instead of ConstraintsNotMetError
Environment Details
- SDV version: 0.10.0
- Python version: 3.11.x
Error Description
If you have data outside of a range of values but you try to apply a ScalarRange constraint anyway, the ConstraintsNotMetError (link) should be thrown but instead the InvalidDataError is being thrown.
Originally identified here: https://github.com/sdv-dev/SDV/issues/1833
Steps to reproduce
Quick code snippet to reproduce in sdv 0.10:
from sdv.datasets.demo import get_available_demos, download_demo
from sdv.single_table import CTGANSynthesizer
demos_df = get_available_demos(modality='single_table')
data, metadata_obj = download_demo('single_table', 'census_extended')
constraint = {
'constraint_class': 'ScalarRange',
'constraint_parameters': {
'column_name': 'age',
'low_value': 5,
'high_value': 10,
'strict_boundaries': True
}
}
synthesizer = CTGANSynthesizer(metadata_obj, epochs=500, verbose=True)
synthesizer.add_constraints(constraints=[constraint])
synthesizer.fit(data)
This is the resulting error:
---------------------------------------------------------------------------
InvalidDataError Traceback (most recent call last)
[<ipython-input-4-463f9f3b4286>](https://localhost:8080/#) in <cell line: 16>()
14 synthesizer = CTGANSynthesizer(metadata_obj, epochs=500, verbose=True)
15 synthesizer.add_constraints(constraints=[constraint])
---> 16 synthesizer.fit(data)
2 frames
[/usr/local/lib/python3.10/dist-packages/sdv/single_table/base.py](https://localhost:8080/#) in fit(self, data)
393 self._data_processor.reset_sampling()
394 self._random_state_set = False
--> 395 processed_data = self._preprocess(data)
396 self.fit_processed_data(processed_data)
397
[/usr/local/lib/python3.10/dist-packages/sdv/single_table/ctgan.py](https://localhost:8080/#) in _preprocess(self, data)
211
212 def _preprocess(self, data):
--> 213 self.validate(data)
214 self._data_processor.fit(data)
215 self._print_warning(data)
[/usr/local/lib/python3.10/dist-packages/sdv/single_table/base.py](https://localhost:8080/#) in validate(self, data)
162
163 if errors:
--> 164 raise InvalidDataError(errors)
165
166 def _validate_transformers(self, column_name_to_transformer):
InvalidDataError: The provided data does not match the metadata:
Data is not valid for the 'ScalarRange' constraint:
age
0 39
1 50
2 38
3 53
4 28
+32556 more