Adding active selection support using epistemic uncertainty for reward ensemble (#462)
This pull request adds the ActiveSelectionFragmenter class for active learning with three supported uncertainty variants - logit, probability, and label. It also refactors CrossEntropyRewardLoss by creating the PreferencePredictor class that can be wrapped on RewardNet to create a model that predicts the preference probability given a fragment pair.
Codecov Report
Merging #482 (9cd4559) into master (45232b7) will increase coverage by
0.06%. The diff coverage is100.00%.
@@ Coverage Diff @@
## master #482 +/- ##
==========================================
+ Coverage 96.88% 96.94% +0.06%
==========================================
Files 84 84
Lines 7278 7421 +143
==========================================
+ Hits 7051 7194 +143
Misses 227 227
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/imitation/scripts/common/reward.py | 98.64% <ø> (ø) |
|
| src/imitation/algorithms/preference_comparisons.py | 99.17% <100.00%> (+0.19%) |
:arrow_up: |
| ...ion/scripts/config/train_preference_comparisons.py | 85.33% <100.00%> (+0.61%) |
:arrow_up: |
| .../imitation/scripts/train_preference_comparisons.py | 98.36% <100.00%> (+0.08%) |
:arrow_up: |
| tests/algorithms/test_preference_comparisons.py | 100.00% <100.00%> (ø) |
|
| tests/scripts/test_scripts.py | 100.00% <100.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
CodeCov is passing now after the merge (which I screwed up but fixed). I think CodeCov sometimes gets confused when the base isn't the most recent version of master.