feature_engine icon indicating copy to clipboard operation
feature_engine copied to clipboard

`pseudolog` transformer

Open ParadaCarleton opened this issue 2 years ago • 10 comments

Request for an inverse-hyperbolic-sine, a.k.a. asinh or pseudolog, transformer. x->arcsinh(x/2) behaves like ln(x) for large values of x, but behaves like x->x for small values of x; this behavior is very useful for values that are almost lognormal, but take on both positive and negative values (e.g. net worth).

Ideally, this should provide location and scale parameters that can be either tuned or set to 0/1 (making the transformation x -> arcsinh((x + loc) / 2scale).)

ParadaCarleton avatar Oct 31 '23 16:10 ParadaCarleton

Hi Carlos,

Thanks for the suggestion. I am not familiar with this transformation, so I don't undertand what you mean by location and scale parameters and tune to 0/1.

Do you have a resource with more details about this transformation that you could share? Like when is it used? who developed it, or whatever you have at hand? We would need that in any case to create the documentation.

Thank you!

solegalli avatar Nov 01 '23 09:11 solegalli

Thanks for the suggestion. I am not familiar with this transformation, so I don't undertand what you mean by location and scale parameters and tune to 0/1.

You can find more information here or here.

By location and scale parameters, I just mean that the transformation is of the form:

x -> asinh( (x + loc) / scale / 2)

Which has 2 parameters, loc and scale, which need to be estimated (usually by maximum likelihood).

However, people will sometimes set loc to 0, giving a simplified transform of the form:

x -> asinh(x / scale / 2)

Which only has one estimated parameter (scale).

Some people will even set scale to 1, just giving asinh(x/2).

ParadaCarleton avatar Nov 01 '23 21:11 ParadaCarleton

This one is interesting, but is it numerically stable?

glevv avatar Nov 04 '23 11:11 glevv

This one is interesting, but is it numerically stable?

Yes, there shouldn't be any problems with it. The only possible numerical problem is if the data aren't scaled and mean-centered, you may have problems with fitting loc and scale. This should probably be mentioned in the docs.

ParadaCarleton avatar Nov 05 '23 00:11 ParadaCarleton

Hey guys! Thank you for the links and discussion. It looks good to me. Would you like to give it a go at drafting a class?

solegalli avatar Nov 10 '23 13:11 solegalli

Hey guys! Thank you for the links and discussion. It looks good to me. Would you like to give it a go at drafting a class?

I think so, but I'm a bit stuck on how to do fitting, in that there are two approaches:

  1. Choose a fit to maximize the normality of the predictor variable. (Easy, but not as accurate)
  2. Maximum likelihood/minimum loss estimation, where we estimate the scale parameter by minimizing the loss in the predictions. (More principled+more accurate).

I think I've worked out how to do 1, but not how to do 2, or whether it's even possible to do using the sklearn API.

ParadaCarleton avatar Nov 13 '23 20:11 ParadaCarleton

@solegalli do you know how I can add a new transformer to the existing tests? I'm not sure where I can find the tests.

ParadaCarleton avatar Nov 22 '23 20:11 ParadaCarleton

You'd probably create a new .py with your transformer within the transformation folder.

Then, you need to create another script within this folder where you'd add the tests.

Plus, you'd need to add your transformer to this file for generic tests, that may fail, but then i can help you troubleshoot.

solegalli avatar Nov 24 '23 08:11 solegalli

Hey @ParadaCarleton ! I wonder if you have a template of this function / class that we could use as a starter to create this transformer? Did you do any work on this?

solegalli avatar Aug 25 '24 16:08 solegalli