End-to-End Planning using UNIAD

Open WeizhenWang-1210 opened this issue 5 months ago • 2 comments

Thanks for the great work!

I'm wondering what encompasses all the needed input for the E2E planning to work. After looking through the code, I am able to find the following information used for inference:

1.6-view RGB image observations at 1600x900 (W, H) resolution, which will later be padded to (1600, 928) 2. Historic ego position and ego angle in BEV coordinate, with both information stored in img_metas['can_bus']. The positions are in meters, while the yaw angle is in degrees. 3. The updated image shape, which was used in BEVFormerEncoder. 4. Camera extrinsics? I think this is stored in img_metas['lidar2img'] as 6 4x4 matrices. I assume each of them corresponds to one camera view, in the same order as the img observations.

So, in order to use UniAD for my own tasks, as long as I can prepare this information, I should be good to go?

Jul 16 '25 00:07 WeizhenWang-1210

I want to know too.

Jul 17 '25 12:07 TJXA

I want to know too.

me too

Aug 28 '25 03:08 STHZzz