Support image-based observation spaces in same way as SB3
At the moment, GAIL and BC don't interoperate well with SB3 in environments with image-based observation spaces. The main problem is the channels axis: many environments put channels last, but Torch convolutions expect channels first. SB3 gets around this with two pieces of code:
- First, in
BaseAlgorithm._wrap_env, it wraps any passed vec env with an image-like observation space through aVecImageTransposewrapper. - Second, in
BasePolicy.predict(), it transposes any image-like numpy observations that it receives so that the (putative) channels axis comes first. It does not do this transform in_predict()orforward(), which both take Torch tensors instead of numpy arrays.
BC and GAIL both break in different ways when trying to interoperate with SB3:
- BC does not do any transposition tricks on the supplied observation space or demonstration. As a result, it needs a channels-first environment to train; otherwise, Torch cannot apply convolutions correctly when BC (implicitly) calls
.forward()on the policy. At test time, the policy still needs to be evaluated in a channels-last environment, since that's what the policy's.predict()method expects. - In GAIL, the reward function training code assumes channels-first (for the same reason that BC does), but the underlying algorithm only works with channels-last. This makes it impossible to use image-based observation spaces.
IMO the cleanest fix to this would remove the two algorithm/policy fixes from SB3. Instead, the user would be responsible for wrapping environments in a VecImageTranspose wrapper if necessary. Our code should work out-of-the-box after that. However, as a stop-gap, I'm considering changing BC and GAIL to do what SB3 does (implicit transposition of observations).
IMO the cleanest fix to this would remove the two algorithm/policy fixes from SB3. Instead, the user would be responsible for wrapping environments in a
VecImageTransposewrapper if necessary. Our code should work out-of-the-box after that. However, as a stop-gap, I'm considering changing BC and GAIL to do what SB3 does (implicit transposition of observations).
Are they not interoperable: i.e. will SB3 fail if we transpose the images?
If we add a wrapper that transposes the images to be NCHW then SB3 will fail, because it will transpose them to NWCH.
(SB3 does have a check that sometimes avoids applying VecTransposeImage twice, but the check fails with AdversarialTrainer for somewhat complicated reasons)
Also, I realised it would be easier to fix this upstream (very few lines of code need to change), so I'll try that first.
I seem this problem is seriously. How should we solve this problem. How we should able to activate VecTransposeImage?
The one of the solution is the following code. The we modify features_extractor_class of ActorCriticPolicy. However, It is not smart.
from stable_baselines3.common.torch_layers import BaseFeaturesExtractor
class CNNExtractor(BaseFeaturesExtractor):
def forward(self, observations: torch.Tensor) -> torch.Tensor:
x=observations
if(x.shape[1]!=3):
x=torch.transpose(x,1,3)
x=torch.transpose(x,2,3)
x=self.cnn(x)
return self.linear(x)
Fixed in https://github.com/HumanCompatibleAI/imitation/pull/519