About the use of the decoder in pts_bbox_head and model outputs in ViDAR training

Open Sora-tabata opened this issue 1 year ago • 1 comments

Dear authors,

First of all, thank you for your excellent work on ViDAR.

I have been studying the configuration file for training the ViDAR model and noticed the following comment in the pts_bbox_head section:

# !!!!!!! DECODER NOT USED !!!!!!!

I have a few questions regarding the model output and training purpose based on this observation.

Model Output What is the primary output of the model in this training configuration? Is the model mainly performing future point cloud prediction? Are tasks such as 3D object detection or segmentation included in this training configuration?
Purpose of Training Is this configuration intended solely for pre-training purposes (i.e., future point cloud prediction)? Or does it also include training for downstream tasks like 3D object detection and segmentation?
Role of pts_bbox_head Could you clarify how pts_bbox_head is used in this training configuration? Additionally, any details on the specific outputs of the model would be greatly appreciated. Thank you very much for your time and assistance. I look forward to your guidance on these questions.

Best regards,

Oct 25 '24 06:10 Sora-tabata

@Sora-tabata Hi, i also encountered the same problem. Can you understand the purpose of using pts_box-head (executing ViDARBEVFormerHead) in the configuration file during the training of a pre-text task (point cloud forecasting)?

Thanks ！

Mar 03 '25 14:03 pupu-chenyanyan