dreamerv3
dreamerv3 copied to clipboard
Fully deterministic runs
Awesome repo. quick question,
I ran the DMC WalkerWalk experiment 3 different times with the same seeds and got 3 different learning curves. How can I get reproducible experiments?Awesome repo. quick question,
I ran the DMC WalkerWalk experiment 3 different times with the same seeds and got 3 different learning curves. How can I get reproducible experiments?
Hi, are you asking for fully deterministic runs? I haven't paid much attention to this but I think the agent is already fully deterministic, so you'd probably just have to set the environment seed (make sure if you use more than 1 environment instance, that the environments have different seeds so they produce different data).
Okay I will try that. Thank you for the quick response! What exactly is an "environment instance"? I couldn't find a clear definition in the paper.
Also, how many seeds were the non-Minecraft experiments run for?
+1 on the question above. Maybe it's not that apparent in the paper, could you also provide some clarification on what the confidence intervals denote in the non-minecraft experiments (DMLab, DMC Proprio, Crafter, etc)? Is it std-error across multiple seeds, or std-error across a window of timesteps with a single seed, or something else?
It's mean/std across seeds and at least 3 seeds per task, often more.
Update: I seeded dmc_control here. And still got non-deterministic runs. Are there other non-environment sources of randomness not seeded?
I found some non-seeded randomness in the repo. Namely here and here. Wouldn't these affect the agent?
I don't think those two methods are run ever. Could you check e.g. by adding asdf
to the two methods to see if it errors?
Seeding this it removes randomness from the first 1000 steps, but runs are non-deterministic afterwards.
@jadkins99 @swannercjj I guess you should check this issue: https://github.com/google/jax/issues/13672#issuecomment-1515544978
It seems just seeding on jax.random does not make it as deterministic operation on GPU because of the optimisation in graph compilation process.