Question about multiple FlattenExtractor components in ActorCriticPolicy
Hi SB3 team,
Thank you for your great work on this library!
I have a question regarding the ActorCriticPolicy architecture . I noticed that there are three separate FlattenExtractor instances: features_extractor, pi_features_extractor, and vf_features_extractor.
ActorCriticPolicy(
(features_extractor): FlattenExtractor(
(flatten): Flatten(start_dim=1, end_dim=-1)
)
(pi_features_extractor): FlattenExtractor(
(flatten): Flatten(start_dim=1, end_dim=-1)
)
(vf_features_extractor): FlattenExtractor(
(flatten): Flatten(start_dim=1, end_dim=-1)
)
(mlp_extractor): MlpExtractor(
(policy_net): Sequential(
(0): Linear(in_features=4, out_features=64, bias=True)
(1): Tanh()
(2): Linear(in_features=64, out_features=64, bias=True)
(3): Tanh()
)
(value_net): Sequential(
(0): Linear(in_features=4, out_features=64, bias=True)
(1): Tanh()
(2): Linear(in_features=64, out_features=64, bias=True)
(3): Tanh()
)
)
(action_net): Linear(in_features=64, out_features=2, bias=True)
(value_net): Linear(in_features=64, out_features=1, bias=True)
)
Could you please clarify what the purpose of each of these is? Specifically:
Why are there three flatten extractors instead of just one?
What is the difference between features_extractor, pi_features_extractor, and vf_features_extractor in this context?
My guess is that features_extractor is used for shared feature extraction, while pi_features_extractor and vf_features_extractor are used for separate, dedicated extraction paths for the policy and value networks, respectively. I also assume that these cannot be active at the same time — that is, when the shared extractor is used, the other two are inactive, and vice versa.
Is this understanding correct?
As a suggestion: it would be helpful if the documentation included a few concrete examples showing the actual output structure of the networks. That would make it easier to understand how the components interact.
Thanks again!
Hello,
Why are there three flatten extractors instead of just one?
You should find your answer in https://github.com/DLR-RM/stable-baselines3/pull/1148 and https://github.com/DLR-RM/stable-baselines3/issues/1066
In short, sometimes you want to share the features extractor to save computation time (and memory) but in other cases you want separate networks to avoid interference (between the actor and critic gradients).
Is this understanding correct?
Correct
As a suggestion: it would be helpful if the documentation included a few concrete examples showing the actual output structure of the networks. That would make it easier to understand how the components interact.
Would you like to contribute this to the documentation?
Thanks for the suggestion! I’d be happy to contribute to the documentation. Could you please point me to the relevant files or guidelines so I can get started?
Thanks for the suggestion! I’d be happy to contribute to the documentation. Could you please point me to the relevant files or guidelines so I can get started?
guidelines (most of them don't apply for doc update): https://github.com/DLR-RM/stable-baselines3/blob/master/CONTRIBUTING.md files: https://github.com/DLR-RM/stable-baselines3/blob/master/docs/guide/custom_policy.rst