benchmarking_platform icon indicating copy to clipboard operation
benchmarking_platform copied to clipboard

Updates to datasets II code

Open greglandrum opened this issue 3 years ago • 1 comments

Things in this PR:

  • Get the scripts which operate on datasets II working in python 3
  • Add additional scorers for: XGB, balanced random forests, LMNB
  • Swap the RF scorer to use vanilla scikit-learn RFs instead of our monkey-patched implementation of balanced random forests.

Notes:

  • I have not done as much work with the datasets I scripts. Those datasets are, with some years of perspective, less interesting and useful, so I'm not feeling strongly compelled to spend time working on them
  • There's significant room for refactoring and removing duplicate code in the scoring scripts. I'll think about doing this.
  • The scoring scripts are quite verbose in their output (generating huge amounts of data). I think it wouldn't be terrible to make the output more compact, but that's a longer term project.

greglandrum avatar Dec 19 '22 16:12 greglandrum

@sriniker : if you have time and inclination to look at this, I'd lover your comments. I have a bit more work to do before marking it as "done", but I wanted to give you a heads up.

greglandrum avatar Dec 19 '22 16:12 greglandrum