agile_autonomy icon indicating copy to clipboard operation
agile_autonomy copied to clipboard

Simulation Drone Constantly Crashing into Obstacles and Ground

Open racheraven opened this issue 2 years ago • 10 comments

@antonilo @kelia @shubham-shahh Thanks for your project! It is a great project! The idea behind is very innovative. I appreciate it a lot!

I have successfully run the entire project, and this project is amazing! I had never been flying a drone at 7m/s or more in forest before .

However, I had found that the performance of the simulated drone sometimes is not as good as the paper described, if I used the default settings in test_settings.yaml. The drone constantly crashing into trees if I fly the drone at 7m/s. In 10 rollouts, it crashed 4 times.

If I reduce the speed of drone to 3.0 by changing the maneuver_velocity and test_time_velocity in default.yaml to 3.0, and increase the flight distance (i.e. length_straight in agile_autonomy.cpp) to 100, the drone will crash into ground when the ground level goes hight. (see the visualization result below). There are clearly available paths if the drone just fly higher a bit.

overview3 The black lines are the planned trajectories. I had limited the pc_cutoff_z to 4.5. It can be seen that the traj is going into the ground and the drone crashed there.

frame_right_00000307 This is how the scene looks like where the drone crashed.

I am deeply confused by this phenomenon. I have checked the depth graph input and network output frequency. They are all about 15Hz, as is suggested in the ReadMe file. I wonder whether this is caused by the imperfect model parameters in the models/ckpt-50. Or is it the problem caused by limited generalization ability of network itself? Or have I made some mistake in the project settings so that it is not working properly?

Thanks for your reply!

racheraven avatar Dec 07 '21 03:12 racheraven

I have met the same situation as you, how do you think about this? @racheraven

ZhangHaley avatar Dec 13 '21 14:12 ZhangHaley

Hi!

Thanks for your interest in our project! I am glad that you find it interesting and innovative. The main reason behind the behaviour you observe is the fact that the checkpoint was not trained to avoid the ground. It basically never saw an instance where the reference points into the ground and the task is avoiding it. That looks however something very easy to do, if that is of interest: just collect a dataset where the task is avoiding the ground, fine-tune the checkpoint on it, and you would be ready to go. About the success rate observed in the forest: I sometimes observed similar behaviors in machines with less computational power (CPU + GPU). Sometimes, the drone is not actually crashing but, due to some delays or artifacts in the pointcloud, a crash is detected by the pipeline. Easy way to solve it: Look through the rollouts and check when the detected crashes are actual real crashes. A second reason for that is that the checkpoint was not tuned for performance in the forest (and as such, I assume it would not get the 90% success rate we report in the paper). Also here, the best policy is to fine-tune it. Even with a small number of rollout you will see a difference. However, you would have to trade off performance with generalization. I hope it helps and that our code could support your future projects!

antonilo avatar Dec 14 '21 02:12 antonilo

Thanks for you reply.

In my case, I believe it is the fine tuning problem, because my computer has GPU NVIDIA 3090 and CPU 10900K. The computational power is strong enough, I guess. And my observation of the drone's behavior is that it went through trees and branches during the test. Therefore, there must be a crash.

I will try fine tuning the network in the forest scene. Will report if any progress is made.

racheraven avatar Dec 14 '21 05:12 racheraven

I have also successfully fired up the simulation environment. However, python test_trajectories.py --settings_file=config/test_settings.yaml doesn't take much effect. What should i expect?

Best

psun-autel avatar Apr 11 '22 18:04 psun-autel

@racheraven did you observe any improvements after finetuning? I am facing similar success rates ~60%.

random-user-in-space avatar Apr 21 '23 10:04 random-user-in-space

@antonilo What do you mean by

just collect a dataset where the task is avoiding the ground, fine-tune the checkpoint on it, and you would be ready to go

Specifically, how do we set the task to be avoiding the ground? Shouldn't the model learn to avoid the hills automatically, since the trajectories that get too close to the ground will have a high cost?

The expert should avoid the hill, because it knows that the cost of the trajectories that get too close to the ground will be vary large. Therefore, it should choose the safer trajectories that go over the hill instead. Wouldn't the model learn this behavior through the training phase and hence learn how to construct good trajectories to go over hills automatically, assuming hills were in our training data?

By "collect a dataset where the task is avoiding the ground", do you mean just fine-tuning the model for the forest hill with dagger training?

I tried running dagger training for over 150 rollouts in the forest, with the quadcopter aimed at the hill each time. Unfortunately, it's still hitting the hill every time. I've kept perform_global_planning disabled, like the readme says to for fine-tuning. Should I enable it to help train the model to go over the hill?

wagler avatar Mar 16 '24 09:03 wagler

Hi @wagler,

I'm happy you continue building up your work on this repository after so long!

Were you able to check if the expert-generated trajectories avoid the hill? I never tested this in detail, so I can't tell for sure. If the expert trajectories are not good, the network trajectories won't be good either. In addition, since you're training the network for a different task (avoiding a hill) I would recommend switching back on the perform_global_planning flag.

As a piece of quick advice, the network trains faster if you remove the conditioning on the reference ( putting this to always false and this to always zero).

I would be very happy if you could share your findings about this issue!

antonilo avatar Mar 18 '24 04:03 antonilo

@antonilo The expert-generated trajectories also hit the hill. I verified this by disabling the network and setting the fallback_radius_expert to 0. I also tried with global planning. I let that run for a while, but each iteration, the global planning fails to find a path, so the experiment is scrapped.

and this to always zero

That line 177 is goal_dir = self.adapt_reference_frame(features_odom_v[k][3:12], reference_direction[k]) What should I set to 0 here?

wagler avatar Mar 18 '24 19:03 wagler

@antonilo I'm also noticing flightmare takes a very long time to return images after we call flightmareBridge_ptr_->getRender(send_id);. It takes about 110ms-120ms. As a result, I'm getting about 8Hz updates on the sgm_depth topic. Even if I reduce the camera size in flightmare.yaml from 640x480 down to 224x224, I only get about 12Hz updates on the sgm_depth topic.

This getRender function is a major bottleneck. We're supposed to do the saveLoop at 15Hz, but can't because the getRender isn't able to go above 12Hz.

As a result of getting slow updates to the sgm_depth topic, the quadcopter can't react in time to trees when flying above 5 m/s (sometimes even fails at 5m/s) when the tree spacing is 5.

I'm running this all on a machine with an RTX 6000 Ada GPU, which has 48 GB VRAM, and an Intel Xeon Silver 4110 CPU @ 2.10GHz with 32GB DRAM. So, I doubt this is a hardware limitation. I also checked in htop and nvtop to confirm that my hardware isn't overloaded.

Is there anything that can be done about the slow getRender function?

wagler avatar Mar 21 '24 04:03 wagler

Hi @wagler,

What should I set to 0 here? set goal dir to 0.

I verified this by disabling the network and setting the fallback_radius_expert to 0. If you do this, you will be running the blind baseline, which is not a good way to check. The best would be to get the global planning working and then use the sampler to get the trajectories out. Maybe using another global planner could be better. I have never tried it myself, but it looks feasible.

Is there anything that can be done about the slow getRender function? Unfortunately, I don't know. Flightmare is open source, so you could look into that. I remember getting 15Hz frames when I trained, though with much less compute. Maybe reduce the resolution even further to 64x64? That should still do pretty well in the forest environment where obstacles are large.

I recommend checking out this repo, which we did two years later, for more updated code: https://github.com/uzh-rpg/agile_flight.

antonilo avatar Mar 21 '24 05:03 antonilo