Remove the need for a dummy environment when instantiating a MultiSyncDataCollector.
Motivation
Currently we must instantiate an instance of the environment we wish to solve before creating a MultiSyncDataCollector object. This is because we can't create a policy without knowing the environment's action and observation specs, and the MultiSyncDataCollector requires us to pass a policy to its constructor. In general we would rather not do this because creating and discarding a dummy environment is wasteful, but it may become a tangible problem for environments that are particularly large or slow to instantiate.
Ideally the MultiSyncDataCollector would allow us to access the observation and action specs from one of its sub-processes before we provide it a policy.
Solution
Construct the collector and query the environment specs before constructing a policy, like so:
collector = MultiSyncDataCollector(...)
action_spec = collector.get_env_action_spec()
obs_spec = collector.get_env_obs_spec()
Then instantiate a policy and pass that to the collector:
policy = MyPolicy(obs_spec, action_spec)
collector.set_policy(policy)
Alternatives
Pass a reference to the policy instantiation callable to the collector, then retrieve the policy object later:
collector = MultiSyncDataCollector(policy_callable=MyPolicy, ...)
policy = collector.get_policy()
Or we might prefer some sort of LUT mapping environment names to specs which does not actually instantiate the environment (this would require an instantiation once when the env is registered, but never again afterwards):
action_spec = GymEnv.get_action_spec("CartPole-v1")
obs_spec = GymEnv.get_obs_spec("CartPole-v1")
policy = MyPolicy(obs_spec, action_spec)
collector = MultiSyncDataCollector(...)
Additional context
This isn't a huge problem, but it would be nice at some point.
Checklist
- [+] I have checked that there is no similar issue in the repo (required)