ml-agents Enable tagging/debugging of rewards for better debbuging/control from trainer

Is your feature request related to a problem? Please describe. Managing contributions of different aspects of the reward function is inflexible. Some situations may involve providing the agent with multiple kinds of rewards, e.g. for achieving a goal, avoiding collisions, various penalties, etc. It would be useful to see the contributions of each of these to the overall agent reward.

Describe the solution you'd like When AddReward or SetReward is called there should be an option to specify an (optional) additional 'tag' e.g. AddReward(1.0f, "reachedTarget"). Each unique tag will represent a different reward type. (If a tag is not applied the reward should be assumed to be part of some general reward type.)

This reward should be added to the usual reward, but should also be logged separately so the contributions of different reward types can be monitored separately in tensorboard. The overall extrinsic reward will simply be the sum of the individual reward types (apart from some additional weightings, see below), so this should be transparent from a training point of view.

Additionally, it would be nice if the weightings of the contibutions of overall rewards could be managed via the trainer config (in the same way intrinsic reward signals are currently). This would enable things like automatically annealing shaping rewards. If curriculum advancement could be defined based on the values of individual reward groups, that would also be very useful.

Describe alternatives you've considered Environment parameters could be used to a similar effect, but only in a limited capacity.

I will probably attempt to implement this myself, and submit a PR with any changes.

May 13 '21 12:05 EndingCredits

This is actually something of interest to us - and the weighting as well. One way you could do it with the current APIs is to have the weights as Environment Parameters, and use custom tensorboard stats to send the rewards to Tensorboard. You might even be able to build this feature on top of the custom tensorboard feature and it won't have to hit the Python code at all. We don't have an ETA for something like this, but hope you are able to get it working!

May 14 '21 15:05 ervteng

I don't mind messing about with the python code. The main thing I'm worried about is causing potential issues with mismatches between unity code and the python mlagents package.

I wasn't aware of custom tensorboard stats, maybe I can get it all working entirely on the Unity side, in which case the pyhthon packages is less of a worry. Still, it would be nice to get a proper integration working.

May 16 '21 13:05 EndingCredits

I have made an implementation of the logging part for splitting rewards into categories and charting them in tensorboard. Basically modded Agent to keep rewards separately in a dictionary, not super complicated and nothing in Python.

If there is any interest in a pull request or sharing my code I can clean it up and share later today or in the weekend. Your idea about curriculum steps sounds great too.

Jun 11 '21 06:06 robinerd

Yes, I'd be very interested! Don't worry about making your code look nice, just a gist or something just to see your implementation would be fine.

I have actually implemented this already myself, but it isn't working perfectly, so it would be good to see your solution.

I also have some (crude) python code for implementing this for curriculum of that's interesting for you.

Jun 25 '21 11:06 EndingCredits

@robinerd Definitely would be interested in taking a look at your implementation. I think it's worth a PR!

Jun 25 '21 22:06 ervteng