ray [RLlib]: SimpleQ TF2 is broken

What happened + What you expected to happen

Something's broken with the SimpleQ TF2 action distribution, but I can't track down the bug. This doesn't happen with TF1/Torch.

Versions / Dependencies

ray 3.0.0dev0 (master) Ubuntu 20.04 tensorflow 2.7.0 (pypi)

Reproduction script

from ray.rllib.algorithms.simple_q import SimpleQConfig

config = (
    SimpleQConfig()
    .environment(env="CartPole-v0")
    .rollouts(num_rollout_workers=0)
    .framework("tf2")
    .exploration(exploration_config={"type": "SoftQ", "temperature": 1.0})
)

algo = config.build()
policy = algo.get_policy()
batch = algo.workers.local_worker().sample()
log_likelihoods = policy.compute_log_likelihoods(batch["actions"], batch["obs"])

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Jun 29 '22 17:06 Rohan138

@Rohan138 This seems to be relevant to the policy_v1 vs v2 migration that we saw the other day. I'll leave you assigned for now. For some reason, we don't test compute_log_likelihood methods at all.

Aug 02 '22 21:08 kouroshHakha

Hello! This P2 issue has seen no activity in at least 2 years. It will be closed in 2 weeks as part of ongoing cleanup efforts.

Please remove the pending-cleanup label if you believe this issue should remain open.

Thanks for contributing to Ray!

Jun 16 '25 23:06 cszhu