About data generation and inference
Hello, First of all, thank you for sharing your wonderful work.
I am quite new to learning-based methods, and I am currently trying to run your code in a Gazebo environment, but I am facing some difficulties.
I have a few questions:
- About the cam2base transformation: In the dataset you provided for the ZED2 camera, the cam2base matrix is shown as: [[ 0. 0. 1. 0.17] [-1. 0. 0. 0. ] [ 0. -1. 0. 0.37] [ 0. 0. 0. 1. ]]
According to this matrix, the camera is facing to the right of the base frame, if I understand correctly. But based on my understanding, this seems to be more like a base2cam matrix rather than cam2base. Could you please clarify this?
-
About the grid bound setting: I understand that the grid bound should depend on the camera configuration. Is there a recommended way to determine the appropriate grid bounds for a custom camera setup before training?
-
About inference inputs and output: When writing inference code, I assume that the required inputs are camera_info, cam2base, depth image, and color image. If I use these inputs along with your provided training code and checkpoints, would the output be an image projected in the BEV space? If possible, could you share an example inference code?
-
About traversing over grass-like obstacles: According to your paper, it seems that the model can learn to traverse over grass or vegetation that is approximately the size of the robot, due to the experience. However, when I tried in Gazebo, the robot seemed to avoid it instead. I am guessing that this might be due to incorrect training on my end, but I would like to confirm whether the model can indeed learn to pass through such soft obstacles even if depth points are present on them.
I am sorry that my questions are quite basic, but I found your work very inspiring. It would be a great help if you could provide some guidance.
Best regards,
- About the cam2base transformation: In the dataset you provided for the ZED2 camera, the cam2base matrix is shown as: [[ 0. 0. 1. 0.17] [-1. 0. 0. 0. ] [ 0. -1. 0. 0.37] [ 0. 0. 0. 1. ]]
According to this matrix, the camera is facing to the right of the base frame, if I understand correctly. But based on my understanding, this seems to be more like a base2cam matrix rather than cam2base. Could you please clarify this?
Hi @hyonhojoh. Same question here! Did you figure this out yet? Thanks for your reply.