Adding active selection support using epistemic uncertainty for reward ensemble (#462)

Open taufeeque9 opened this issue 3 years ago • 1 comments

This pull request adds the ActiveSelectionFragmenter class for active learning with three supported uncertainty variants - logit, probability, and label. It also refactors CrossEntropyRewardLoss by creating the PreferencePredictor class that can be wrapped on RewardNet to create a model that predicts the preference probability given a fragment pair.

Jul 20 '22 23:07 taufeeque9

Codecov Report

Merging #482 (9cd4559) into master (45232b7) will increase coverage by 0.06%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #482      +/-   ##
==========================================
+ Coverage   96.88%   96.94%   +0.06%     
==========================================
  Files          84       84              
  Lines        7278     7421     +143     
==========================================
+ Hits         7051     7194     +143     
  Misses        227      227

Impacted Files	Coverage Δ
src/imitation/scripts/common/reward.py	`98.64% <ø> (ø)`
src/imitation/algorithms/preference_comparisons.py	`99.17% <100.00%> (+0.19%)`	:arrow_up:
...ion/scripts/config/train_preference_comparisons.py	`85.33% <100.00%> (+0.61%)`	:arrow_up:
.../imitation/scripts/train_preference_comparisons.py	`98.36% <100.00%> (+0.08%)`	:arrow_up:
tests/algorithms/test_preference_comparisons.py	`100.00% <100.00%> (ø)`
tests/scripts/test_scripts.py	`100.00% <100.00%> (ø)`

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

Jul 21 '22 00:07 codecov[bot]

CodeCov is passing now after the merge (which I screwed up but fixed). I think CodeCov sometimes gets confused when the base isn't the most recent version of master.

Aug 23 '22 04:08 AdamGleave