RL-Adventure
                                
                                 RL-Adventure copied to clipboard
                                
                                    RL-Adventure copied to clipboard
                            
                            
                            
                        Distributional Reinforcement Learning with Quantile Regression
Hi, what does the "u" means in the following code snippets? It seems that the "u" is not defined in the code? Thanks!
huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2) huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k)) quantile_loss = (tau - (u < 0).float()).abs() * huber_loss
I think probably it should be something like:
u = dist - expected_quant
After adding u = dist - expected_quant
TypeError                                 Traceback (most recent call last)
/home/--/anaconda2/envs/tensorflow4/lib/python2.7/site-packages/torch/tensor.pyc in sub(self, other) 310 311 def sub(self, other): --> 312 return self.sub(other) 313 314 def rsub(self, other):
TypeError: sub received an invalid combination of arguments - got (Variable), but expected one of:
- (float value) didn't match because some of the arguments have invalid types: (Variable)
- (torch.FloatTensor other) didn't match because some of the arguments have invalid types: (Variable)
- (float value, torch.FloatTensor other)
Should be something like:
u = expected_dist.t().unsqueeze(-1) - dist
loss = self.huber(u) * (self.tau.view(1, -1) - (u.detach() < 0).float()).abs()
loss = loss.mean(1).sum()
When I last looked at this it ran after converting to a variable: u=expected_quant-dist huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2) huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k)) quantile_loss = (autograd.Variable(tau.cuda()) - ((u < 0).float())).abs() * (huber_loss) loss = (quantile_loss.sum() / num_quant)
Friend, this a question.
It confused me.