gaussianize icon indicating copy to clipboard operation
gaussianize copied to clipboard

Data not transformed

Open Nikos-T opened this issue 4 years ago • 4 comments

I tried to transform my data to gaussian distribution and the output of the transform function is the same as the original data. I created this script to run some tests. Thank you in advance. Test script:

import gaussianize as g
import numpy as np

x_uni = np.random.uniform(size=(1000,1))
out_uni = g.Gaussianize()
out_uni.fit(x_uni)
print("out_uni coefficients:", out_uni.coefs_, "\n")
y_uni = out_uni.transform(x_uni)
print("sum of transform minus original uniform:", sum(y_uni-x_uni), "\n")

x_norm = np.random.normal(size=(1000,1))
out_norm = g.Gaussianize()
out_norm.fit(x_norm)
print("out_norm coefficients:", out_norm.coefs_, "\n")
y_norm = out_norm.transform(x_norm)
print("sum of transform minus original normal:", sum(y_norm-x_norm), "\n")

import scipy.io as scio
import os

midep1 = scio.loadmat(os.path.join("D:\\", "Shared", "Data - sLorTimeseries", "LorTimeSeries_2_EC_MID_001_ep1.mat"))["LoretaTimeSeries"]
x = midep1[0, :]
x.shape = (4001, 1)
out= g.Gaussianize()
out.fit(x)
print("my data coefficients:", out.coefs_, "\n")
y = out.transform(x)
print("tranform minus original of my data:")
print(y-x, "\n")
print("sum of transform minus original:", sum(y-x), `"\n")

Output:

out_uni coefficients: [(0.4964280888006252, 0.2880773918099591, 0.0)] 
sum of transform minus original uniform: [-8.04911693e-16] 
out_norm coefficients: [(-0.013567057474285795, 0.9626024748274751, 0.013488)] 
sum of transform minus original normal: [-0.15229739] 
my data coefficients: [(2.8030224, 1.3002187, 0.0)] 
tranform minus original of my data:
[[ 0.0000000e+00]
 [ 0.0000000e+00]
 [-1.3737008e-08]
 ...
 [ 0.0000000e+00]
 [ 0.0000000e+00]
 [ 0.0000000e+00]] 
sum of transform minus original: [1.0423828e-06] `

The fit of x_uni has different coefficients compared to x_norm but y_uni and y_norm are equal to x_uni and x_norm respectively. In my understanding the transformed signal should scale to resemple a gaussian distribution. Am I missing something or not running the transforms correctly?

Nikos-T avatar Mar 24 '20 11:03 Nikos-T

Hmm, I'm worried that it's gaussianizing each row separately (i.e., each row is considered a single sample of some 1-d data). Can you try transposing the data matrix and see if it works?

gregversteeg avatar Mar 25 '20 18:03 gregversteeg

The result is the same, but now the coefficients in object out are a list of 4001 tuples of size 3. What I keep seeing is that the 3rd value in the tupple is always equal to zero. Maybe this is the problem?

As you can see in the picture below, my data (x) are somewhat close to gaussian, but zero change (in y) seems strange to me. εικόνα

Maybe I am missing something so I am going to be more clear: I have a timeseries of 4001 points and I want to gaussianize it. I expect the values close to 6 and 7 to contract and the values around 0 and 1 to dialate so that my above histogram will transform into more gaussian-like. Is this the intent of your class?

Nikos-T avatar Mar 26 '20 10:03 Nikos-T

Hi Nikos, you are correct in interpreting the intent of the class. I'm afraid I haven't used it in a long time and don't remember much about the details. Let me make a few observations though.

  • One kind of behavior you're concerned with is skew, i.e., the distribution skews to the right. Georg's paper (linked in readme) talks about transformations to make the kurtosis and skew more Gaussian. I'm not positive but I think my package only implements the correction for kurtosis. You could try to use Georg's R package for the more general transformation including skew.
  • I recommend explicitly checking skew and kurtosis before and after transformation to see if the transform is having the desired effect.
  • The third parameter in coefs_ is the important one (delta in Georg's paper, the first two are just mean and standard deviation). The fact that delta is zero for your fit model is a bad sign... it means it's not even reducing the kurtosis (heavy tails). The algorithm to find delta is sensitive to initial conditions... you could try making your data have a mean of zero first, then see if it learns a better (non-zero) delta.

Sorry I can't be of more help, this was kind of a weekend project from a few years ago and I haven't really used it much myself since then. If you do figure anything out, please comment back here in case it's useful for others trying to do this type of Gaussianization.

gregversteeg avatar Apr 03 '20 16:04 gregversteeg

Hi @Nikos-T , FWIW looking at your histogram chart it does not seem like your data even has heavy tails (its a truncated distribution on the left, and on the right it's barely heavy tailed). I wouldnt be surprised if 'delta' is estimated at 0. What's the kurtosis / skewness of your data in the histogram?

gmgeorg avatar Jul 25 '20 03:07 gmgeorg

Hello, unfortunately, I do not remember neither my use-case, nor its data. However re-reading my problem, I suppose you are correct and I probably started with a non-skewed distribution, or I misinterpreted the scope of this "gaussanization". I am closing the issue, thanks for the help.

Nikos-T avatar Sep 04 '23 18:09 Nikos-T