simple_bev Garbage BEV prediction on Carla Data distribution

Hello @aharley , @zfang399 . I have been working on adapting the SimpleBEV model to the Carla dataset and truly admire the innovative contributions of your work.

While using SimpleBEV with the Carla dataset, we adjusted the coordinate system from left-hand (Carla's default) to right-hand to align with the training setup used in your paper (e.g., NuScenes). This transformation was consistently applied across the 6 cameras and LiDAR sensors. However, during training, I have encountered several challenges: a)Training Stability: The process is not stable on the Carla dataset. I am using 6cam only and fine-tuning model by loading your 6cam only checkpoints. b)Prediction Issues: The model outputs are mostly garbage predictions, some of which I have attached for your reference. c)Dataset Distribution: There seems to be uncertainty in understanding the dataset distribution in relation to the model’s requirements.

Can you tell me, What I am missing ? As I have used training script same as you and replicate all transforms as you have done while writing dataloader. can you please help me into this issue.

pred ground_truth

Jan 08 '25 08:01 RohitPawar2406

My first guess is that this is an input-level bug. Are you sure the data going into the model is in the right numerical range?

Jan 08 '25 21:01 aharley

_, feat_bev_e, seg_bev_e, center_bev_e, offset_bev_e = model(
        rgb_camXs=rgb_camXs,
        pix_T_cams=pix_T_cams,
        cam0_T_camXs=cam0_T_camXs,
        vox_util=vox_util,
        rad_occ_mem0=None)
        

        If we consider above as as mentioned model, then rgb_camXs:[-0.5, 0.5] range values with shape (6,3,448,800) (N,C,H,W).
        pix_T_cams is same shape with rescaling factor , which I checked recently is proper.
        cam0_T_camXs : All cameras w.r.t cam0 (front camera 0) , I guess this is right notation according to you. 
        vox_util : Kept same as you mentioned and Lidar is None for first case.
        
        By having same still I am not able to do properly. 
        
        I have certain questions ?
        a) As Carla dataset follows Left hand co-ordinate system for all sensors and ego-pose. Do I need to convert to Right-hand System (as Nuscenes dataset ) ?  just mentioning I did this conversion and testing things.
        b) I loaded your checkpoint of only 6 cams and trained on my Carla custom dataset and kept all hyper-parameters same as you kept while training for some 1500 iterations and checked the prediction still not getting good visualisation. Do you think you parameters will be helpful ? for simulation based dataset 
        c) you have mentioned velo_T_cam: cam w.r.t velo sensor. I guess you are assuming Egopose and velo are same while mentioning this variable ? As all sensors gives us data w.r.t ego co-ordinate system only ? 
        d) I have attached training loss vs iterations, definitely not stable for Carla and sometimes its negative. which I did not understand.

train_loss

ThankYou. If you are okay, I am ready to collaborate or guidance and discuss further research project.

Jan 09 '25 10:01 RohitPawar2406

Also adding, my recent visualisation seems to be below attaches photo with ground truth. Above training graph is same i.e. unstable or say its not training.

pred_new_v2 gt_new_v2

Jan 09 '25 16:01 RohitPawar2406

@RohitPawar2406 Did you manage to solve the issues? I am planning on doing the same but want to avoid the Problems you had

Nov 07 '25 13:11 erikdeinzer

@aharley Sir, I successfully completed the fine-tuning on Carla GT and addressed two key engineering aspects during the process:

a) Ground Truth Collection

Extracted semantic maps from Carla.

Main challenge: obtaining only the 3D bounding boxes visible across 3–6 cameras.

b) Code Adaptation

Converted Carla’s left-hand coordinate system to a right-hand system to align with the pretrained configuration.

Current Progress At present, the model performs well on a single obstacle class. I’m now extending it to multi-class segmentation, including obstacles, lane markings, and roads. Additionally, I’ve collected an instance segmentation dataset for obstacles to enable multi-agent tracking within the environment. Dataset preparation (completed) and pipeline formulation are currently in progress.

Your work on Simple-BEV has been both inspiring and insightful. As a second-year MS student with a strong focus on deep learning and implementation-driven research, I would be genuinely interested in exploring potential collaboration opportunities or extensions of your work. It would be an excellent opportunity for me to contribute as an implementer while learning from your expertise.

If you’re open to it, I’d be glad to discuss further over email or a brief call to explore how I could meaningfully support or extend this line of work.

Thank you once again for your impactful contribution to the community, it’s been a motivating foundation for my current research efforts.

Nov 07 '25 20:11 RohitPawar2406