acme
acme copied to clipboard
The program always executes behavior clone when running CQLLearner.
trafficstars
When I run the cql algorithm, I found the algorithm only execute behavior clone. I checked the config used. The training step is 100 and the 'num_bc_iters' is set to 50.
When I further dive to the source code of CQLLearner, I found the 'counts' in function 'step' has two keys "steps" and "walltime".
However, in the inplementation of 'step', the key used is "learner_steps".
The invalid key "learner_steps" makes the "cur_step" always be 0, thus causing the algorithm only execute behavior clone.
When I correct the key "learner_steps" to "steps", the problem is solved.