ml-agents Calling EndEpisode multiple times in a step generates multiple calls to OnEpisodeBegin an CollectObservations is called twice

Describe the bug It is common to have multiple reasons for ending an episode which can cause multiple calls to "OnEpisodeBegin" for a single episode.

    public override void OnActionReceived(ActionBuffers actionBuffers)
    {
        ...
        if (CheckReasonOne())
        {
            EndEpisode();
        }

        if (CheckReasonTwo())
        {
            EndEpisode();
        }

        if (CheckReasonThree())
        {
            EndEpisode();
        }
    }

EndEpisode() can be called multiple times in a single Academy step. If EndEpisode is called multiple times then OnEpisodeBegin is called multiple times. There should be only one call to OnEpisodeBegin per episode.

Note that CollectObservations is also called one extra time on the step that EndEpisode() is called. This second call typically happens after the agent has received its actions for this step and used those actions to modify the environment. This second set of observations likely will not match the actions for this step.

To Reproduce

With the 3DBall demo scene modfy the Ball3DAgent.cs script:

In OnEpisodeBegin add a line that prints out that it was called and the step number:

Debug.Log(Academy.Instance.StepCount + " OnEpisodeBegin called");

In CollectObservations add a line that prints out that it was called and the step number;

Debug.Log(Academy.Instance.StepCount + " CollectObservations called");

Duplicate the line EndEpisode(); several times

           EndEpisode();
            EndEpisode();
            EndEpisode();
            EndEpisode();

Run the scene in training mode with a single agent active in the scene
observe that OnEpisodeBegin is called multiple times at the beginning of a single episode and that CollectObservations gets called twice on the step that EndEpisode is called.

Expected behaviour:

OnEpisodeBegin should only be called once at the beginning of an episode
It should be safe to call EndEpisode() multiple times in a single step without generating multiple resets.
CollectObservations should not be called a second time if it was already called this academy step.

Environment (please complete the following information):

2019.4
Windows 10
Version 7
Tensorflow 2.3.1
3DBall (with modifications described above)

Oct 10 '20 17:10 Phong13

Hi @Phong13

Is there a good reason to not use an if/else block to handle multiple conditions for ending an episode? Would this resolve your issue (at least functionally, I agree we can make the API safer)? Thank you for pointing this out.

Oct 12 '20 14:10 andrewcoh

Thanks for looking at this. The code in the bug is a silly unrealistic example that shows the bug in its simplest form.

Thanks for the "if else" suggestion. In my case the situation is not so simple. I am using multiple system that check different scenarios for the end of the episode. Some of the checks are fairly complex. The systems shouldn't need to be aware of each other and check with each other if someone else has called EndEpisode. I can deal with it by having a single flag variable on the agent that is set if the episode has ended and check that.... but this is silly. Isn't there already an isDone variable? Yes... except it isn't exposed in the API.

Oct 12 '20 15:10 Phong13

Some of the checks are fairly complex. The systems shouldn't need to be aware of each other and check with each other if someone else has called EndEpisode.

This is a very fair point. It's not hard to imagine how this logic could (1) be complex and (2) live in different parts of a large codebase. I agree, the systems shouldn't need to be aware of each other. I'll pass this on to the team to find a good solution.

Oct 12 '20 15:10 andrewcoh

Hi Andrew,

IMO the dangerous part of this bug is that a single episode has multiple beginnings (OnEpisodeBegin). There should be a one-to-one relationship between an episode and OnEpisodeBegin. It is dangerous that this can get fired multiple times for a single episode. I can think of all sorts of things that can go wrong because of this.

Oct 13 '20 01:10 Phong13

Hi Andrew,

I think the task you created "Clean up EndEpisode demo code #4563" misses the point.

IMO the demos are using EndEpisode correctly. There is nothing wrong with how the demos use the code. The problem is with the API itself.

Look at the code in Agent.cs. EndEpisode immediately fires "OnEpisodeBegin" without doing any checks about status of the agent (is the Agent already done). There is a mismatch between the number of times "OnEpisodeBegin" gets called and the true number of episodes. Many of these "OnEpisodeBegin" calls share an episode. If a developer is doing any bookkeeping inside "OnEpisdoeBegin" or keep statistics on training progress then they could get very bad data.

Oct 13 '20 02:10 Phong13

Hi Andrew, IMO the problem here is that there is no integrity checking to enforce the structure of episodes and episode steps.

For PPO and SAC to work correctly, an episode should consist of a an ordered collection of steps. Each step should have:

Observations Actions Reward

The Academy should have some integrity checking to ensure that EndEpisode is being called in a valid context:

The agent must be inside an episode
There must be at least one step
The last step must be complete (has Observations, Actions, Reward)

If EndEpisode is called and this is not the case then errors should be generated. This bug is failing to enforce the first integrity condition. The Academy is executing "EndEpisode" code when there is no episode.

Oct 13 '20 02:10 Phong13

Hi Andrew,

I think the task you created "Clean up EndEpisode demo code #4563" misses the point.

IMO the demos are using EndEpisode correctly. There is nothing wrong with how the demos use the code. The problem is with the API itself.

Look at the code in Agent.cs. EndEpisode immediately fires "OnEpisodeBegin" without doing any checks about status of the agent (is the Agent already done). There is a mismatch between the number of times "OnEpisodeBegin" gets called and the true number of episodes. Many of these "OnEpisodeBegin" calls share an episode. If a developer is doing any bookkeeping inside "OnEpisdoeBegin" or keep statistics on training progress then they could get very bad data.

This wasn't intended to be the fix and I fully understand the issue you're raising. I was just going back through our EndEpisode documentation to see if it could be improved and found these small issues.

Oct 13 '20 13:10 andrewcoh

Ahh, my apologies, thanks for clarifying.

BTW I love how responsive you guys are and the transparency of this project. Feels like it is easy to communicate with you guys.

Oct 13 '20 16:10 Phong13

This issue has been automatically marked as stale because it has not had activity in the last 28 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

Apr 18 '22 16:04 stale[bot]

This issue has been automatically closed because it has not had activity in the last 42 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.

Aug 13 '22 18:08 stale[bot]

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Sep 12 '22 20:09 github-actions[bot]

ml-agents ml-agents copied to clipboard

Calling EndEpisode multiple times in a step generates multiple calls to OnEpisodeBegin an CollectObservations is called twice

ml-agents
ml-agents copied to clipboard