Paddle3D [WIP][Bug fix] use pad_shape instead of img_shape for bevformer and recover the origin code where the first frame pre

Use pad_shape instead of img_shape for bevformer, where img_shape reserves the origin image shape and pad_shape is the image shape after padded.
Recover the origin codes where the first frame pre_bev is None. Present codes set the first frame pre_bev to zeros during the trainng phase in order to keep consistent with the deploy phase, but it seems to be not too right. The reasons are two below:

Temporal information cannot be closed flexiblely where the pre_bev should be set as None.
When temporal information is needed, pre_bev is the key and value of the temporal encoder layers. If pre_bev is None, the encoder layer will set the value of pre_bev the same as query.

Jan 30 '23 07:01 FlyingQianMM

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

luoqianhui seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Jan 30 '23 07:01 CLAassistant

This PR conflicts with the latest code, please fix it

Feb 06 '24 02:02 nepeplwu

[WIP][Bug fix] use pad_shape instead of img_shape for bevformer and recover the origin code where the first frame pre_bev is None