rsmtool icon indicating copy to clipboard operation
rsmtool copied to clipboard

Draft: Integrate feature preprocessor as step in SKLL learner pipeline

Open mulhod opened this issue 3 years ago • 2 comments

The basic idea is that one of the outputs of running RSMTool should be a model file that can be loaded and used immediately with the same type of raw features used to run the original experiment. This PR adds a named step to the SKLL learner pipeline and then also saves the pipeline separately.

In [1]: import joblib

In [2]: model = joblib.load(open("output/ASAP2.pipeline.model", "rb"))

In [3]: ! head -2 train.csv
ID,DISCOURSE,ORGANIZATION,GRAMMAR,MECHANICS,LENGTH,score,score2
RESPONSE_1,4.93806460126142,-0.0846667513334603,-0.316793975540994,4.65591397849462,279,3,3

In [4]: ! head -2 output/ASAP2_pred_train.csv
spkitemid,raw,sc1,scale,raw_trim,raw_trim_round,scale_trim,scale_trim_round
RESPONSE_1,3.467158796079344,3.0,3.487689689334681,3.467158796079344,3,3.487689689334681,3

In [5]: model.predict([{"DISCOURSE": 4.93806460126142, "ORGANIZATION": -0.0846667513334603, "GRAMMAR": -0.316793975540994, "MECHANICS": 4.65591397849462}])
Out[5]: array([3.4671588])

mulhod avatar Jul 18 '22 17:07 mulhod

Hello @mulhod! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :tada:

Comment last updated at 2022-07-18 19:11:35 UTC

pep8speaks avatar Jul 18 '22 17:07 pep8speaks

Codecov Report

Merging #569 (b173a92) into main (933d17b) will decrease coverage by 0.09%. The diff coverage is 80.00%.

:exclamation: Current head b173a92 differs from pull request most recent head 9c2b546. Consider uploading reports for the commit 9c2b546 to get more accurate results

@@            Coverage Diff             @@
##             main     #569      +/-   ##
==========================================
- Coverage   93.14%   93.05%   -0.10%     
==========================================
  Files          31       31              
  Lines        4525     4552      +27     
==========================================
+ Hits         4215     4236      +21     
- Misses        310      316       +6     
Impacted Files Coverage Δ
rsmtool/modeler.py 96.36% <80.00%> (-1.22%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 933d17b...9c2b546. Read the comment docs.

codecov[bot] avatar Jul 18 '22 18:07 codecov[bot]