Updates to datasets II code

Open greglandrum opened this issue 3 years ago • 1 comments

Things in this PR:

Get the scripts which operate on datasets II working in python 3
Add additional scorers for: XGB, balanced random forests, LMNB
Swap the RF scorer to use vanilla scikit-learn RFs instead of our monkey-patched implementation of balanced random forests.

Notes:

I have not done as much work with the datasets I scripts. Those datasets are, with some years of perspective, less interesting and useful, so I'm not feeling strongly compelled to spend time working on them
There's significant room for refactoring and removing duplicate code in the scoring scripts. I'll think about doing this.
The scoring scripts are quite verbose in their output (generating huge amounts of data). I think it wouldn't be terrible to make the output more compact, but that's a longer term project.

Dec 19 '22 16:12 greglandrum

@sriniker : if you have time and inclination to look at this, I'd lover your comments. I have a bit more work to do before marking it as "done", but I wanted to give you a heads up.

Dec 19 '22 16:12 greglandrum