SparseBEV
SparseBEV copied to clipboard
Question about more frames
Hi! Sorry to bother you again~
I tried to input more frames (16 frames instead of 8 frames) on the ImageNet-pre-trained r50 model, but it resulted in worse results (40.5 mAP and 51.6 NDS). I just set the num_frames = 16 in the config. Is that something wrong?
Looking forward to your reply! Thank you!
It's strange. I didn't try more than 8 frames, but it should improve the performance. I will have a try when I have free GPUs.
By the way, why don't you use the R50 pretrained on nuImg? It's more stable.
By the way, why don't you use the R50 pretrained on nuImg? It's more stable.
Thanks for your reply. I will have a try.
By the way, why don't you use the R50 pretrained on nuImg? It's more stable.
Hi~ I ran the R50 pretained on nuImg (configs). I obtained the baseline results (44.7 mAP and 55.5 NDS) and the results of 16 frames (42.6 mAP and 53.1 NDS). 🤔️
I need to run a few experiments to find out why, thanks for your patience!
Hi @Debrove @afterthat97 , I'm also working on this work. in my opinion, 16 frames cover a fairly long span of around 8 seconds in the NuScenes dataset, which raises concerns about the validity of aligning object motion with ego motion. Additionally, it’s uncertain whether the same object can be consistently captured over this time window. For instance, an object in the front view might remain visible for a longer period, whereas this wouldn’t necessarily be the case for side views.
Example 1: As the frame increases, the prediction of offsets for sampling point locations seems to become increasingly inaccurate.
Example 2: At a corner with an occlusion relation, the occlusion relation seems to be ignored for the pedestrian sampling points in the pre-frame.