imitation
imitation copied to clipboard
Implemented the ability to train rewards in preference comparison against multiple policies
Description
This pull request adds the MixtureOfTrajectoryGenerators
class which allows preference comparison to be trained on data from multiple sources. This includes the ability to train against multiple agents and train against a mixture of data from an agent and a dataset.
To facilitate this I have made modifications to HeirachicalLogger
class allowing for arbitrary prefixes to be added so that different trajectory generators logs don't clobber each other.
Testing
Changes to the HeirachicalLogger
class have been unit tested. MixtureOfTrajectoryGenerators
has also been united test. I've also included new integration tests for the train_preference_comparison.py
script to test the use case of having multiple policies.
Codecov Report
Merging #529 (408e248) into master (91c66b7) will increase coverage by
0.03%
. The diff coverage is99.29%
.
:exclamation: Current head 408e248 differs from pull request most recent head 4df8551. Consider uploading reports for the commit 4df8551 to get more accurate results
@@ Coverage Diff @@
## master #529 +/- ##
==========================================
+ Coverage 97.16% 97.19% +0.03%
==========================================
Files 85 85
Lines 7646 7769 +123
==========================================
+ Hits 7429 7551 +122
- Misses 217 218 +1
Impacted Files | Coverage Δ | |
---|---|---|
src/imitation/algorithms/preference_comparisons.py | 99.04% <97.82%> (-0.13%) |
:arrow_down: |
...ion/scripts/config/train_preference_comparisons.py | 85.71% <100.00%> (+0.38%) |
:arrow_up: |
.../imitation/scripts/train_preference_comparisons.py | 97.22% <100.00%> (+0.50%) |
:arrow_up: |
src/imitation/util/logger.py | 100.00% <100.00%> (ø) |
|
tests/algorithms/test_preference_comparisons.py | 100.00% <100.00%> (ø) |
|
tests/scripts/test_scripts.py | 100.00% <100.00%> (ø) |
|
tests/util/test_logger.py | 100.00% <100.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
@levmckinney let me know if there's any particular areas you'd like my feedback on, I'll aim to try and do a high-level review but may not have time to look at it in detail.
@Rocamonde can you do a style/technical review on Monday?
When reading through the logging code, I see we are maintaining a somewhat advanced and general logging infrastructure for hierarchical logging. I'm surprised there's not an already existing way to handle this without a bespoke implementation. What are other people doing? (i.e. in similar ML projects) Are our logging needs very specific and different to the way this is normally done? It might be interesting to spin off the logging module into e.g. an SB3 PR, so people don't have to reinvent the wheel every time.
When reading through the logging code, I see we are maintaining a somewhat advanced and general logging infrastructure for hierarchical logging. I'm surprised there's not an already existing way to handle this without a bespoke implementation. What are other people doing? (i.e. in similar ML projects) Are our logging needs very specific and different to the way this is normally done? It might be interesting to spin off the logging module into e.g. an SB3 PR, so people don't have to reinvent the wheel every time.
What we are doing is kind of weird -- you don't normally have several RL trainers that all need to log to the same place -- but it could come up in some other applications, e.g. population-based training. You could look at how rllib
handles this.
@levmckinney let me know if there's any particular areas you'd like my feedback on, I'll aim to try and do a high-level review but may not have time to look at it in detail.
@AdamGleave Could you look at the changes to the HierarchicalLogger?
@Rocamonde I think Lev requested re-review, could you take another look soon? Thanks!
What's the status of this? I borrowed from this PR partially in other PRs that have been merged to avoid blockage / reinventing the wheel, so it probably requires some re-writing.
@levmckinney, what's outstanding here? I know you're busy now, but I can likely find someone else to finish the PR off.
However, IIRC the policy ensemble didn't help performance much, so perhaps it's served its purpose (of running those experiments) and we should just cherry-pick any helpful features from it then close this PR.
@levmckinney, what's outstanding here? I know you're busy now, but I can likely find someone else to finish the PR off.
However, IIRC the policy ensemble didn't help performance much, so perhaps it's served its purpose (of running those experiments) and we should just cherry-pick any helpful features from it then close this PR.
I think we can close this. The main thing we might want to cherry-pick out is the changes to the hierarchical logger. However, that only really matters if we are planning to move forward with this logging framework.