Switch to normal distribution for infinite action spaces
Previous, the sampling failed if the action space is infinite. While an infinite action space might be bad practise, in some cases it is not easy to get rid of them.
This commit checks if any limit for the action space is infinite. If so, it will use a normal distribution for sampling. Currently, the mean/mu/loc is set to all zeros and the scale/var to all ones. In the future, a more flexible definition of those two parameters might be useful.
Finally, using all lows and highs of the action space will create a uniform distribution with the correct limits per dimension/action.
This commit has been manually tested with Pendulum-v1 and LunarLanderContinuous-v2 using the Pendulum (DDPG) example.
I just realized, it would be better to check for every dimension wether it needs a uniform sampling or a normal distribution.
Better yet would be to make the sampling user definable.