imitation icon indicating copy to clipboard operation
imitation copied to clipboard

Support image-based observation spaces in same way as SB3

Open qxcv opened this issue 5 years ago • 5 comments

At the moment, GAIL and BC don't interoperate well with SB3 in environments with image-based observation spaces. The main problem is the channels axis: many environments put channels last, but Torch convolutions expect channels first. SB3 gets around this with two pieces of code:

  • First, in BaseAlgorithm._wrap_env, it wraps any passed vec env with an image-like observation space through a VecImageTranspose wrapper.
  • Second, in BasePolicy.predict(), it transposes any image-like numpy observations that it receives so that the (putative) channels axis comes first. It does not do this transform in _predict() or forward(), which both take Torch tensors instead of numpy arrays.

BC and GAIL both break in different ways when trying to interoperate with SB3:

  • BC does not do any transposition tricks on the supplied observation space or demonstration. As a result, it needs a channels-first environment to train; otherwise, Torch cannot apply convolutions correctly when BC (implicitly) calls .forward() on the policy. At test time, the policy still needs to be evaluated in a channels-last environment, since that's what the policy's .predict() method expects.
  • In GAIL, the reward function training code assumes channels-first (for the same reason that BC does), but the underlying algorithm only works with channels-last. This makes it impossible to use image-based observation spaces.

IMO the cleanest fix to this would remove the two algorithm/policy fixes from SB3. Instead, the user would be responsible for wrapping environments in a VecImageTranspose wrapper if necessary. Our code should work out-of-the-box after that. However, as a stop-gap, I'm considering changing BC and GAIL to do what SB3 does (implicit transposition of observations).

qxcv avatar Aug 05 '20 23:08 qxcv

IMO the cleanest fix to this would remove the two algorithm/policy fixes from SB3. Instead, the user would be responsible for wrapping environments in a VecImageTranspose wrapper if necessary. Our code should work out-of-the-box after that. However, as a stop-gap, I'm considering changing BC and GAIL to do what SB3 does (implicit transposition of observations).

Are they not interoperable: i.e. will SB3 fail if we transpose the images?

AdamGleave avatar Aug 06 '20 02:08 AdamGleave

If we add a wrapper that transposes the images to be NCHW then SB3 will fail, because it will transpose them to NWCH.

(SB3 does have a check that sometimes avoids applying VecTransposeImage twice, but the check fails with AdversarialTrainer for somewhat complicated reasons)

qxcv avatar Aug 06 '20 02:08 qxcv

Also, I realised it would be easier to fix this upstream (very few lines of code need to change), so I'll try that first.

qxcv avatar Aug 06 '20 02:08 qxcv

I seem this problem is seriously. How should we solve this problem. How we should able to activate VecTransposeImage?

alfredplpl avatar Jun 07 '21 23:06 alfredplpl

The one of the solution is the following code. The we modify features_extractor_class of ActorCriticPolicy. However, It is not smart.

from stable_baselines3.common.torch_layers import BaseFeaturesExtractor
class CNNExtractor(BaseFeaturesExtractor):
    def forward(self, observations: torch.Tensor) -> torch.Tensor:
        x=observations
        if(x.shape[1]!=3):
            x=torch.transpose(x,1,3)
            x=torch.transpose(x,2,3)
        x=self.cnn(x)
        return self.linear(x)

alfredplpl avatar Jun 19 '21 05:06 alfredplpl

Fixed in https://github.com/HumanCompatibleAI/imitation/pull/519

AdamGleave avatar Oct 22 '22 02:10 AdamGleave