BEVFormer icon indicating copy to clipboard operation
BEVFormer copied to clipboard

关于reference_points和predicted_3dbbox

Open kingofstu opened this issue 1 year ago • 0 comments

您好,首先很高兴你们能有这样伟大的工作!其次,我想提两个问题: 1、在BevFormer的decoder中: query_pos, query = torch.split( object_query_embed, self.embed_dims, dim=1) # [900,256], [900,256] query_pos = query_pos.unsqueeze(0).expand(bs, -1, -1) query = query.unsqueeze(0).expand(bs, -1, -1) # [B,900,256] reference_points = self.reference_points(query_pos) # linear [B,900,3] reference_points = reference_points.sigmoid() 这个reference_points是否可以理解为anchor的坐标? 它是全局坐标系下的坐标还是bev自车坐标系下的坐标呢?

2、当使用reg_branches时,预测的10个量分别表示什么,看上去像是6层xy的偏移加上reference_points的xy,tmp[ ..., 4:5]+reference_points的z表示什么呢?predicted_3dbbox的坐标是全局坐标系下的坐标还是bev自车坐标系下的坐标? if reg_branches is not None: tmp = reg_brancheslid # 6个回归的头,每个都是2层mlp [B,900,10] assert reference_points.shape[-1] == 3 new_reference_points = torch.zeros_like(reference_points) new_reference_points[..., :2] = tmp[ ..., :2] + inverse_sigmoid(reference_points[..., :2]) new_reference_points[..., 2:3] = tmp[ ..., 4:5] + inverse_sigmoid(reference_points[..., 2:3]) new_reference_points = new_reference_points.sigmoid() reference_points = new_reference_points.detach()

kingofstu avatar Jun 05 '24 07:06 kingofstu