Discrete-Continuous-VLN Question about the imitation learning strategy in the paper

Question about the imitation learning strategy in the paper

Open YESAndy opened this issue 11 months ago • 1 comments

Hi Yicong,

I realized that the imitation learning loss you used in the code base is essentially the cross entropy loss between the predicted action and the oracle action which is obtained by selecting the closest waypoint to the goal. However, this oracle action might not be optimal because sometimes the closest waypoint may not be on the ground truth path (reference path in the dataset). like the following pic,

It is likely to cause the agent to loop around the area.

As the waypoint predictor shows very good results, I wonder if you can comment on how the waypoint predictor manages to avoid the above issue.

Many thanks! Andy

Mar 08 '24 01:03 YESAndy

I'd like to give you a quick answer to the shown example...We use the shortest geodesic distance (the length of the shortest path to the goal starting from the waypoint) to select the gt action. Although the top waypoint has the shortest geometric distance to the goal, its geodesic distance to the goal is still higher than the right waypoint (as it needs to trace back to S and follow the right path) so the right waypoint is still GT.

One very rare case is that the shortest path to the goal is from the left, and the geodesic distance of the top waypoint is lower than the right, in this case, it does have the problem you mentioned...strongest-path waypoint isn't optimal. But considering the robustness of our predictor, it usually will output a waypoint on the left in this case and this will be a good gt waypoint. In our paper, we can achieve 97% SR with ground truth actions in R2RCE, which shows it's hard to have a loop point. However, I've visualized the failure case, besides some cases with a bad simulation from habitat (like a visually open place but you can't go), there are about only two cases caused by such a loop, which is, the predictor cannot predict a waypoint on the optimal direction (even no waypoint in the optimal 180-degree side), so it has to go back and starts a loop (like between the top waypoint and S in your figure). But as I mentioned before this is a very very rare case.

Mar 08 '24 06:03 wz0919

Discrete-Continuous-VLN Discrete-Continuous-VLN copied to clipboard

Question about the imitation learning strategy in the paper

Discrete-Continuous-VLN
Discrete-Continuous-VLN copied to clipboard