rsmtool
rsmtool copied to clipboard
A Python package to facilitate research on building and evaluating automated scoring models.
Now that we are Python 3.6+ only, we should switch to using format strings.
It's confusing that the name of the argument is `flag_column` and one of the values that it takes is also `flag_column`. This makes the docstring very confusing to write. We...
For classifiers that do not support expected probabilities, we currently rely on SKLL to raise a warning and proceed generating integer scores. The final report still says "Predictions analyzed in...
It would be useful to add some best practices for sharing reports with other people to the documentation. When to send just the HTML, when to zip up everything, when...
Currently, `rsmpredict` supports an undocumented option of specifying an output directory instead of file if the output_file does not have a `.csv` or `.xlsx` extension. However, there are several inconsistencies:...
Now that we are Python 3.6+ only, it makes more sense to use the more readable `pathlib.Path` interface rather than os-level functions.
[JIRA] Perhaps, we can use multiprocessing or multithreading to speed up model training, report generation etc. This might be relevant: http://ipyparallel.readthedocs.io/en/latest/intro.html
It would be great to have the option to convert all NaN feature values to 0 (though fine to not have it be the default). We could show a warning...