coach icon indicating copy to clipboard operation
coach copied to clipboard

Normalised Advantage Function output nan action.

Open BigRiceBall-zz opened this issue 5 years ago • 0 comments

Hi,

I train a NAF agent on the HalfCheetah environment with the default setting. But after 148 episodes, it seems that NAF agent output a nan action. The error message is shown below:

ValueError: The given action does not match the action space definition. Action = [nan nan nan nan nan nan], action space definition = BoxActionSpace: shape = [6], low = [-1. -1. -1. -1. -1. -1.], high = [1. 1. 1. 1. 1. 1.]

My guess is that the loss and gradient become infinity, so the weight will become nan and the output action will become nan as well.

The result csv indeed shows that the loss (e.g. 8.24687E+12) and unclip grad (inf) is very large.

Is there any appropriate way to fix it?

BigRiceBall-zz avatar Jun 20 '19 14:06 BigRiceBall-zz