feature_engine icon indicating copy to clipboard operation
feature_engine copied to clipboard

[New Transformer] ChiMergeDiscretisor

Open Morgan-Sell opened this issue 2 years ago • 7 comments

Closes #450.

Notes from #450:

Existing implementations:

https://github.com/lisette-espin/pychimerge https://github.com/night18/ChiMerge https://github.com/raiyan1102006/ChiMerge https://gist.github.com/alanzchen/17d0c4a45d59b79052b1cd07f531689e?short_path=f2e54c6

Reference to the original article can be found in the first link

Morgan-Sell avatar May 22 '22 04:05 Morgan-Sell

Hi @Morgan-Sell

Thanks for kicking this off.

I reckon this one is not ready for review, right?

It would be great to have some tests with the expected result for the transformer.

Thank you!

solegalli avatar Jun 08 '22 17:06 solegalli

hola @solegalli,

Correct, it's not ready for review. I'm still working on it. And, yes, I'll create tests.

Morgan-Sell avatar Jun 08 '22 22:06 Morgan-Sell

@solegalli,

Should we avoid using dataframes and instead use dictionaries and numpy arrays? I suspect iterating through dataframes increases computational costs.

I'm going to keep the question, but I think the answer is "yes". Dictionaries and numpy arrays simplify the merging of frequency distributions ;)

Morgan-Sell avatar Jun 14 '22 00:06 Morgan-Sell

hola @solegalli,

I think I need an extra set of eyes. I'm struggling to identify what is causing the error for test_chi_merge(). I believe the root cause is in _calc_chi_square(); however, I cannot identify where.

In test_chi_merge(), the expected results are [4.1, 2.4, 8.6, 2.9, 1.7, 1.8, 2.2, 4.8, 4.1, 3.2, 1.5, 3.6]. These are the results stated in the Chi-Merge paper.

The transformer returns the following results: [5.9, 5.2, 8.6, 2.9, 1.8, 1.8, 3.1, 4.8, 4.1, 3.2, 1.8, 13.0].

The above values are the results from the chi-square tests of the consecutive distributions. 5 of the 12 expected results are incorrect.

Indices of the values that don't reconcile: 0, 1, 6, 10, and 11.

Do you see the bug?

Morgan-Sell avatar Jun 19 '22 16:06 Morgan-Sell

hola @solegalli,

Did you have a chance to look at this bug? I'm stumped.

Morgan-Sell avatar Jul 06 '22 22:07 Morgan-Sell

hi @solegalli,

Are you still getting around to reviewing this discretizer? I think it's super cool!

I know you're quite busy. I'm trying to organize myself.

Morgan-Sell avatar Aug 20 '22 16:08 Morgan-Sell

Still pending. I send you an email? would that work?

solegalli avatar Aug 22 '22 08:08 solegalli