RL-Adventure icon indicating copy to clipboard operation
RL-Adventure copied to clipboard

Distributional Reinforcement Learning with Quantile Regression

Open yydxlv opened this issue 6 years ago • 6 comments

Hi, what does the "u" means in the following code snippets? It seems that the "u" is not defined in the code? Thanks!

huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2) huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k)) quantile_loss = (tau - (u < 0).float()).abs() * huber_loss

yydxlv avatar Mar 31 '18 04:03 yydxlv

I think probably it should be something like:

u = dist - expected_quant

hohoCode avatar Apr 10 '18 03:04 hohoCode

After adding u = dist - expected_quant

TypeError Traceback (most recent call last) in () 15 16 if len(replay_buffer) > batch_size: ---> 17 loss = compute_td_loss(batch_size) 18 losses.append(loss.data[0]) 19

in compute_td_loss(batch_size) 17 huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2) 18 huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k)) ---> 19 quantile_loss = (tau - (u < 0).float()).abs() * huber_loss 20 loss = quantile_loss.sum() / num_quant 21

/home/--/anaconda2/envs/tensorflow4/lib/python2.7/site-packages/torch/tensor.pyc in sub(self, other) 310 311 def sub(self, other): --> 312 return self.sub(other) 313 314 def rsub(self, other):

TypeError: sub received an invalid combination of arguments - got (Variable), but expected one of:

  • (float value) didn't match because some of the arguments have invalid types: (Variable)
  • (torch.FloatTensor other) didn't match because some of the arguments have invalid types: (Variable)
  • (float value, torch.FloatTensor other)

angmc avatar Apr 12 '18 21:04 angmc

Should be something like:

u = expected_dist.t().unsqueeze(-1) - dist
loss = self.huber(u) * (self.tau.view(1, -1) - (u.detach() < 0).float()).abs()
loss = loss.mean(1).sum()

qfettes avatar Jun 03 '18 16:06 qfettes

When I last looked at this it ran after converting to a variable: u=expected_quant-dist huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2) huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k)) quantile_loss = (autograd.Variable(tau.cuda()) - ((u < 0).float())).abs() * (huber_loss) loss = (quantile_loss.sum() / num_quant)

angmc avatar Jun 05 '18 17:06 angmc

Friend, this a question.

LRiver-wut avatar Apr 22 '23 11:04 LRiver-wut

It confused me.

LRiver-wut avatar Apr 22 '23 11:04 LRiver-wut