SparseBEV icon indicating copy to clipboard operation
SparseBEV copied to clipboard

Question about more frames

Open Debrove opened this issue 1 year ago • 6 comments

Hi! Sorry to bother you again~

I tried to input more frames (16 frames instead of 8 frames) on the ImageNet-pre-trained r50 model, but it resulted in worse results (40.5 mAP and 51.6 NDS). I just set the num_frames = 16 in the config. Is that something wrong?

Looking forward to your reply! Thank you!

Debrove avatar Oct 27 '23 02:10 Debrove

It's strange. I didn't try more than 8 frames, but it should improve the performance. I will have a try when I have free GPUs.

afterthat97 avatar Oct 27 '23 03:10 afterthat97

By the way, why don't you use the R50 pretrained on nuImg? It's more stable.

afterthat97 avatar Oct 27 '23 03:10 afterthat97

By the way, why don't you use the R50 pretrained on nuImg? It's more stable.

Thanks for your reply. I will have a try.

Debrove avatar Oct 27 '23 03:10 Debrove

By the way, why don't you use the R50 pretrained on nuImg? It's more stable.

Hi~ I ran the R50 pretained on nuImg (configs). I obtained the baseline results (44.7 mAP and 55.5 NDS) and the results of 16 frames (42.6 mAP and 53.1 NDS). 🤔️

Debrove avatar Oct 28 '23 03:10 Debrove

I need to run a few experiments to find out why, thanks for your patience!

afterthat97 avatar Oct 28 '23 11:10 afterthat97

Hi @Debrove @afterthat97 , I'm also working on this work. in my opinion, 16 frames cover a fairly long span of around 8 seconds in the NuScenes dataset, which raises concerns about the validity of aligning object motion with ego motion. Additionally, it’s uncertain whether the same object can be consistently captured over this time window. For instance, an object in the front view might remain visible for a longer period, whereas this wouldn’t necessarily be the case for side views.

image Example 1: As the frame increases, the prediction of offsets for sampling point locations seems to become increasingly inaccurate. image Example 2: At a corner with an occlusion relation, the occlusion relation seems to be ignored for the pedestrian sampling points in the pre-frame.

Gigalomanicx avatar Sep 18 '24 03:09 Gigalomanicx