benchmarking_platform
benchmarking_platform copied to clipboard
Updates to datasets II code
Things in this PR:
- Get the scripts which operate on datasets II working in python 3
- Add additional scorers for: XGB, balanced random forests, LMNB
- Swap the RF scorer to use vanilla scikit-learn RFs instead of our monkey-patched implementation of balanced random forests.
Notes:
- I have not done as much work with the datasets I scripts. Those datasets are, with some years of perspective, less interesting and useful, so I'm not feeling strongly compelled to spend time working on them
- There's significant room for refactoring and removing duplicate code in the scoring scripts. I'll think about doing this.
- The scoring scripts are quite verbose in their output (generating huge amounts of data). I think it wouldn't be terrible to make the output more compact, but that's a longer term project.
@sriniker : if you have time and inclination to look at this, I'd lover your comments. I have a bit more work to do before marking it as "done", but I wanted to give you a heads up.