feature_engine
feature_engine copied to clipboard
Information Value for nominal variables
Presently, there are no packages in python to calculate Information Value using WOE for nominal/categorical variables. As WOE Encoder is already available in Feature Engine, hence I am raising the request for a feature to obtain information value for nominal variables
could you add a few links with information about WoE and IV?
https://www.listendata.com/2015/03/weight-of-evidence-woe-and-information.html
https://www.listendata.com/2019/08/WOE-IV-Continuous-Dependent.html
http://ucanalytics.com/blogs/information-value-and-weight-of-evidencebanking-case/
Hi Sole, I am interested to work on this, but I dont not much idea about open source contribution. I have recently watched all your ML courses on Udemy and I really liked them, thats why I am inclined to take a shot at this. Can you guide me a little bit here.
Welcome @saurabhgoel1985
Absolutely! I look forward to working with you.
First of all, it's been a while since I read those articles on IV, so I guess, the first thing would be to go over them and maybe add a few bullet points below on what the new transformer should be doing.
From the top of my head, the IV is calculated based on the WoE, and with the IV, we can select features.
The new transformer should go in a new python script with a meaningful name, inside the folder feature_engine/selection
I suggest you take a look at the WoEEncoder class that lives in the feature_engine/encoding folder, and also take a look at another selection class from the selection folder. For example the SelectByShuffling.
The best would be to kind of copy one of those classes into the new script, and edit the content as needed. Because the new class needs the WoE to calculate the IV, it might be a good idea that it inherits the WoE, but I am not sure. Feel free to explore if there is a better solution.
Have fun! And let me know if you need anything!
Correction: the transformer is for feature selection, so I edited the former comment.
Hey everyone, I apologize I forgot to write that PR #488 closes this PR. The PR is mentioned above.
I’ve made some decent progress. I’m on vacation. I will continue to work on the PR when I return.
i believe that @solegalli has provided feedback that I need to incorporate.
Hey everyone, I apologize I forgot to write that PR #488 closes this PR. The PR is mentioned above.
I’ve made some decent progress. I’m on vacation. I will continue to work on the PR when I return.
i believe that @solegalli has provided feedback that I need to incorporate.
Thanks for the update Morgan, hope the PR gets merged soon
Sorry @Morgan-Sell I 've not reviewed that PR yet. You make so many PRs that I can't keep up :p
I'll review the 2 remaining PRs on Wednesday.
Cheers