Distributional Reinforcement Learning with Quantile Regression
Hi, what does the "u" means in the following code snippets? It seems that the "u" is not defined in the code? Thanks!
huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2) huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k)) quantile_loss = (tau - (u < 0).float()).abs() * huber_loss
I think probably it should be something like:
u = dist - expected_quant
After adding u = dist - expected_quant
TypeError Traceback (most recent call last)
/home/--/anaconda2/envs/tensorflow4/lib/python2.7/site-packages/torch/tensor.pyc in sub(self, other) 310 311 def sub(self, other): --> 312 return self.sub(other) 313 314 def rsub(self, other):
TypeError: sub received an invalid combination of arguments - got (Variable), but expected one of:
- (float value) didn't match because some of the arguments have invalid types: (Variable)
- (torch.FloatTensor other) didn't match because some of the arguments have invalid types: (Variable)
- (float value, torch.FloatTensor other)
Should be something like:
u = expected_dist.t().unsqueeze(-1) - dist
loss = self.huber(u) * (self.tau.view(1, -1) - (u.detach() < 0).float()).abs()
loss = loss.mean(1).sum()
When I last looked at this it ran after converting to a variable: u=expected_quant-dist huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2) huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k)) quantile_loss = (autograd.Variable(tau.cuda()) - ((u < 0).float())).abs() * (huber_loss) loss = (quantile_loss.sum() / num_quant)
Friend, this a question.
It confused me.