General Questions
Hi! I hope you're doing well. I had a couple of questions I was hoping you could answer.
- Is the Dogfight scene the same scene you used to train your agents? I was wondering why you don't have borders if the ships fly out of the asteroid field.
- Why doesn't the episode reset if an agent collides with something?
- How long did you run the training?
- In your YouTube video you had teams but don't seem to have them in the demo, why?
- Why is the step amount 8000? Seems like a lot for training
- Do the agents spawn in random positions and orientations on reset?
Apologies if you've already answered these somewhere. I'm curious as to how you accomplished this :)
Best, James
Hi James,
Is the Dogfight scene the same scene you used to train your agents? I was wondering why you don't have borders if the ships fly out of the asteroid field.
Yes, there's a "brake zone" - if ships fly outwards beyond a given radius, the code applies some counter velocity to stop them.
https://github.com/mbaske/grid-sensor/blob/master/Assets/Examples/Dogfight/Scripts/Spaceship.cs#L111
When that happens, agents get less rewards because they're slower, so they should learn to avoid that region. The NormPosition field indicates a ship's distance from the center and is observed by its controlling agent.
Why doesn't the episode reset if an agent collides with something?
I think it'll work either way. One thing to consider though is that if episodes end and collision penalties are too low, the agent might want to exploit that by crashing into asteroids in order to evade enemies.
How long did you run the training?
I haven't kept the original logs or checkpoints, but I usually try to set the max_steps in the config yaml file to a value that roughly matches what I was doing. In this case, training should run for about 5 million steps.
In your YouTube video you had teams but don't seem to have them in the demo, why?
That was an older version of the project. When I updated it for using grid sensors, I just wanted to simplify the code a bit and removed the teams.
Why is the step amount 8000? Seems like a lot for training
The idea was to give the agents some time for figuring out flight paths, but a lower value like 5000 should also be fine.
Do the agents spawn in random positions and orientations on reset?
The initial asteroid and agent spawn positions sit on a 3D grid. For the agents, I randomized the specific grid pos per episode. See https://github.com/mbaske/grid-sensor/blob/master/Assets/Examples/Dogfight/Scripts/AsteroidField.cs
Thank you so much! I was able to update my model to reflect your explanation. I'll post back here to show you the results!