Support distributions in GaussianCopulaSynthesizer that better capture extreme values
Problem Description
If a column's values have a few extreme values, we don't have a clear distribution in GaussianCopulaSynthesizer to recommend. One example is a 'horseshoe distribution', image borrowed from this blog.
Originally suggested here: https://github.com/sdv-dev/SDV/issues/2240
We currently support the norm, beta, truncnorm, uniform, gamma, and gaussian_kde
Just a note that the beta distribution can take on a "horseshoe-like" shape when parameters alpha and beta are both <1. For an example, see the wikipedia article.
SDV is designed to estimate parameters based on the shape of the real data itself. If the desire is to artificially synthesize extreme values (diverging from the real data), then conditional sampling is the recommended approach.
I'm closing this issue off, as we have recently added support for 150+ distributions in the XGCSynthesizer. This synthesizer is only compatible with distributions from scipy.
I see that the Horseshoe distribution is available through TensorFlow, which we can track this as an additional request -- but the current expansion should at least help improve data quality in the meantime.