SDMetrics
SDMetrics copied to clipboard
Detection test test doesn't look at metadata when determining which columns to use
Environment details
If you are already running SDMetrics, please indicate the following details about the environment in which you are running it:
- SDMetrics version: 2.4.2-dev0
- Python version: 3.9
- Operating System: ubuntu 20.04
Problem description
When the primary_key is set, the generated data index restarts from zero. As a consequence, detection metrics can trivially detect generated instances by setting a threshold on the primary_key.
What I already tried
I will propose a patch to remove primary_key columns if sets form these tests.
Thank for filing this issue @TanguyUrvoy. There are a few upcoming changes to the interactions between SDMetrics, RDTs and metadata. I suggest we hold off until the library gets to a more stable place.
As a temporary workaround, you can drop the primary_key columns from the real and synthetic data so that the test will ignore them.
[Update] Seems as if the detection test is actually ignoring the metadata in general. This doesn't just affect primary keys but also any other columns where the type info might be useful for detection. I'll update the bug to make it broader.