Draft: Integrate feature preprocessor as step in SKLL learner pipeline
The basic idea is that one of the outputs of running RSMTool should be a model file that can be loaded and used immediately with the same type of raw features used to run the original experiment. This PR adds a named step to the SKLL learner pipeline and then also saves the pipeline separately.
In [1]: import joblib
In [2]: model = joblib.load(open("output/ASAP2.pipeline.model", "rb"))
In [3]: ! head -2 train.csv
ID,DISCOURSE,ORGANIZATION,GRAMMAR,MECHANICS,LENGTH,score,score2
RESPONSE_1,4.93806460126142,-0.0846667513334603,-0.316793975540994,4.65591397849462,279,3,3
In [4]: ! head -2 output/ASAP2_pred_train.csv
spkitemid,raw,sc1,scale,raw_trim,raw_trim_round,scale_trim,scale_trim_round
RESPONSE_1,3.467158796079344,3.0,3.487689689334681,3.467158796079344,3,3.487689689334681,3
In [5]: model.predict([{"DISCOURSE": 4.93806460126142, "ORGANIZATION": -0.0846667513334603, "GRAMMAR": -0.316793975540994, "MECHANICS": 4.65591397849462}])
Out[5]: array([3.4671588])
Hello @mulhod! Thanks for updating this PR.
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :tada:
Comment last updated at 2022-07-18 19:11:35 UTC
Codecov Report
Merging #569 (b173a92) into main (933d17b) will decrease coverage by
0.09%. The diff coverage is80.00%.
:exclamation: Current head b173a92 differs from pull request most recent head 9c2b546. Consider uploading reports for the commit 9c2b546 to get more accurate results
@@ Coverage Diff @@
## main #569 +/- ##
==========================================
- Coverage 93.14% 93.05% -0.10%
==========================================
Files 31 31
Lines 4525 4552 +27
==========================================
+ Hits 4215 4236 +21
- Misses 310 316 +6
| Impacted Files | Coverage Δ | |
|---|---|---|
| rsmtool/modeler.py | 96.36% <80.00%> (-1.22%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 933d17b...9c2b546. Read the comment docs.