VAD
VAD copied to clipboard
Why VADPerceptionTransformer.cams_embeds and level_embeds are tensors with random values ?
Hi, dear authors,
when i evaluate VAD following official steps and run forward infer, sometimes BEVFormerEncoder's output is tensor with NAN which occurs randomly. Soon i found that the output feature from backbone+FPN will be added with self.cams_embeds and self.level_embeds, which are tensors with random values, and some elements' values are quite large (e+31),and this results in that feature with large values sent into BEVFormerEncoder and it outputs +/-inf and then produces NAN after layernorm.
So, i wonder why self.cams_embeds and self.level_embeds (nn.Parameter object) are random tensors instead of that with fixed trained values ?