SDMetrics icon indicating copy to clipboard operation
SDMetrics copied to clipboard

Implement MSAS

Open LiFaytheGoblin opened this issue 2 years ago • 2 comments

Problem Description

The current Metrics implemented in SDV do not specifically measure the quality of sequences generated with CPAR.

Expected behavior

MSAS is a metric for sequential data quality, detailed in http://arxiv.org/abs/2207.14406. It should be implemented in SDV.

LiFaytheGoblin avatar Aug 26 '22 06:08 LiFaytheGoblin

Thanks for filing @LiFaytheGoblin. We'll keep this open to track as we make progress on it.

Just a note that MSAS refers to our overall algorithm of computing sequential data quality, and works in the following steps:

  1. Compute a metric for every sequence in the real data to get a distribution X
  2. Compute the same metric for every sequence in the synthetic data to get a distribution X'
  3. Use the KSComplement test to compare the distributions X and X'

Various metrics can be used in step 1. In the paper we used: length, mean, median, standard deviation and the difference between a row n and some step n+t.

Are there any particular metrics that are more or less important to your use case?

npatki avatar Aug 29 '22 16:08 npatki

FYI some metrics that will use MSAS are actively being discussed in #198

npatki avatar Aug 31 '22 16:08 npatki