[RLlib] Stashed policies are being accessed excessively, defeating the purpose of a policy cache

Open gjoliver opened this issue 3 years ago • 0 comments

What happened + What you expected to happen

The idea of a policy cache is to stash unused policies on disk or in object store, to alleviate memory stress. That requires our code to only access cached policies, and restore states properly when a policy is un-stashed, at all times. If we access already stashed policies blindly, we will get into a situation where useful policies get stashed then immediately un-stashed, which will slow down things significantly and unnecessarilly. While debugging things, I notice at least the following places where we are accessing ALL policies, regardless of whether they are in the cache or not:

Syncing weights from all policies to Eval workers. This will cause the local and eval workers to stash then unstash all policies. https://github.com/ray-project/ray/blob/50e1fda022a81e5015978cf723f7b5fd9cc06b2c/rllib/algorithms/algorithm.py#L816-L826
RolloutWorker seems to set global vars on all policies regardless of whether they are stashed. https://github.com/ray-project/ray/blob/50e1fda022a81e5015978cf723f7b5fd9cc06b2c/rllib/evaluation/rollout_worker.py#L1781-L1782
APPO target network update works on all trainable policies. This will cause excessive policy restoring on the training workers. https://github.com/ray-project/ray/blob/50e1fda022a81e5015978cf723f7b5fd9cc06b2c/rllib/algorithms/appo/appo.py#L239-L242

These are the places I have discovered so far. Things do seem a lot quieter if I comment out all these logics.

Versions / Dependencies

Master

Reproduction script

Add logging lines where we restore and stashing policies, then run: bazel run rllib/learning_tests_multi_agent_cartpole_w_100_policies_appo

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Jan 04 '23 16:01 gjoliver