SDMetrics
SDMetrics copied to clipboard
Metrics to evaluate quality and efficacy of synthetic datasets.
`get_column_plot` produces histograms which take a lot of liberty when representing the data, especially at the edges. The Real data and the matplotlib plot represent the same data (ignore the...
### Problem Description The detection metrics for [single table data](https://docs.sdv.dev/sdmetrics/metrics/metrics-in-beta/detection-single-table) and [sequential data](https://docs.sdv.dev/sdmetrics/metrics/metrics-in-beta/detection-sequential) both compute the `AUC (ROC)` and return `1-AUC` as the final score. The score is hard to...
### Problem Description What are the different metrics I can use to check quality of PII information produced? report.get_diagnostics() checks the coverage and range of numerical/categorical data. But is there...
The goal is to propose a new column-pair metric between one numerical and one categorical column. ### Current behavior The Quality report has to discretize the numerical column and do...
The goal here is to make the `NewRowSynthesis` metric more fault-tolerant and make it faster/more efficient to run.
### Problem Description Some metrics such as [StatisticSimilarity](https://docs.sdv.dev/sdmetrics/metrics/metrics-glossary/statisticsimilarity) are defined on a per-column level. If I want to apply it to several columns of several tables at once, I have...
### Problem Description I am working with a home-grown synthesizer that is able to synthesize relatively rare categorical values (i.e. one that occurs maybe 3 or 4 times in a...
### Problem Description I cannot override the synthetic sample size used in the diagnostic report for the NewRowSynthesis metric, for both single-table and multiple-table diagnostic reports. Currently, I am doing...
### Environment Details Please indicate the following details about the environment in which you found the bug: * SDMetrics version: * Python version: * Operating System: ### Error Description metadata1={'fields':...
The snippet below should be something like: `data[field] = pd.Series(integers, data.index)`. https://github.com/sdv-dev/SDMetrics/blob/c9967494126e6273d3d97ebf8c1b045861a3f126/sdmetrics/utils.py#L199 As currently implemented, the transformed data will incorrectly map the values to the wrong data if the index...