RL-Adventure
RL-Adventure copied to clipboard
Distributional Reinforcement Learning with Quantile Regression
Hi, what does the "u" means in the following code snippets? It seems that the "u" is not defined in the code? Thanks!
huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2) huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k)) quantile_loss = (tau - (u < 0).float()).abs() * huber_loss
I think probably it should be something like:
u = dist - expected_quant
After adding u = dist - expected_quant
TypeError Traceback (most recent call last)
/home/--/anaconda2/envs/tensorflow4/lib/python2.7/site-packages/torch/tensor.pyc in sub(self, other) 310 311 def sub(self, other): --> 312 return self.sub(other) 313 314 def rsub(self, other):
TypeError: sub received an invalid combination of arguments - got (Variable), but expected one of:
- (float value) didn't match because some of the arguments have invalid types: (Variable)
- (torch.FloatTensor other) didn't match because some of the arguments have invalid types: (Variable)
- (float value, torch.FloatTensor other)
Should be something like:
u = expected_dist.t().unsqueeze(-1) - dist
loss = self.huber(u) * (self.tau.view(1, -1) - (u.detach() < 0).float()).abs()
loss = loss.mean(1).sum()
When I last looked at this it ran after converting to a variable: u=expected_quant-dist huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2) huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k)) quantile_loss = (autograd.Variable(tau.cuda()) - ((u < 0).float())).abs() * (huber_loss) loss = (quantile_loss.sum() / num_quant)
Friend, this a question.
It confused me.