synthcity
synthcity copied to clipboard
How is train-test-splitting beneficial?
Question
How is splitting the real input dataset into train and test parts beneficial model evaluation?
Further Information
Based on line 130 and lines 232-273 in file eval.py
, all evaluation metrics with the exception of DOMINAS are passed the real test data and the generated data, but not the train dataset. I was wondering whether this hinders effective evaluation with respect to the metrics' ability to detect privacy violation and generalization as well as the benefit this decision has to statistical fidelity metrics. The only metric I think splitting off a test dataset that the generator is not allowed to see is makes some sense is TSTR. More specifically, I'm unsure about:
- How does a privacy metric check privacy violations if it does not know the training dataset it is supposed to protect?
- How is a metric supposed to detect the generator's ability to generalize if the training dataset is unknown?
- What problems do you see when statistical fidelity metrics or discriminative score, for instance, compare train dataset with generated dataset? (In my opinion, this would enable us to make the test dataset for TSTR and others smaller and the train dataset bigger.)
Screenshots
Not applicable
System Information
Not applicable
Additional Context
Not applicable