PyTorchCML
PyTorchCML copied to clipboard
TwoStageSampler is giving Simplex() error
Following error is happening no matter if I am using pos or neg weights
epoch1 avg_loss:1.206: 0%| | 1/256 [00:00<00:57, 4.46it/s]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-16-da8e4fbdcbb5> in <module>
16 #sampler = samplers.BaseSampler(train_set = cml_train_set, neg_weight = neg_weight, n_user = n_user, n_item = n_item, device=device, strict_negative=True)
17 trainer = trainers.BaseTrainer(cml_model_all_in, optimizer, criterion, sampler)
---> 18 trainer.fit(n_batch=256, n_epoch=10)
~/.local/lib/python3.7/site-packages/PyTorchCML/trainers/BaseTrainer.py in fit(self, n_batch, n_epoch, valid_evaluator, valid_per_epoch)
75 self.sampler.set_candidates_weight(dist, self.model.n_dim)
76
---> 77 neg_items = self.sampler.get_neg_batch(users.reshape(-1))
78
79 # initialize gradient
~/.local/lib/python3.7/site-packages/PyTorchCML/samplers/TwoStageSampler.py in get_neg_batch(self, users)
121 weight = self.candidates_weight
122
--> 123 neg_sampler = Categorical(probs=weight)
124 neg_indices = neg_sampler.sample([self.n_neg_samples]).T
125 neg_items = self.candidates[neg_indices]
~/site-packages/torch/distributions/categorical.py in __init__(self, probs, logits, validate_args)
62 self._num_events = self._param.size()[-1]
63 batch_shape = self._param.size()[:-1] if self._param.ndimension() > 1 else torch.Size()
---> 64 super(Categorical, self).__init__(batch_shape, validate_args=validate_args)
65
66 def expand(self, batch_shape, _instance=None):
~/site-packages/torch/distributions/distribution.py in __init__(self, batch_shape, event_shape, validate_args)
54 if not valid.all():
55 raise ValueError(
---> 56 f"Expected parameter {param} "
57 f"({type(value).__name__} of shape {tuple(value.shape)}) "
58 f"of distribution {repr(self)} "
ValueError: Expected parameter probs (Tensor of shape (256, 200)) of distribution Categorical(probs: torch.Size([256, 200])) to satisfy the constraint Simplex(), but found invalid values:
tensor([[7.2129e-03, 7.2939e-03, 0.0000e+00, ..., 6.3333e-03, 0.0000e+00,
0.0000e+00],
[9.9370e-03, 0.0000e+00, 1.6256e-02, ..., 5.3079e-03, 9.3441e-03,
0.0000e+00],
[9.9370e-03, 0.0000e+00, 1.6256e-02, ..., 5.3079e-03, 9.3441e-03,
0.0000e+00],
...,
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 4.5067e-03, 4.2130e-03,
1.3499e-02],
[1.3212e-28, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00, 1.4386e-28,
1.3498e-28],
[0.0000e+00, 6.2382e-03, 7.7719e-03, ..., 0.0000e+00, 0.0000e+00,
1.5519e-02]], grad_fn=<DivBackward0>)
Thanks for the report. Does the error also occur when you don't set the weights ? Could you tell me how to reproduce the error on my end if you know ?
I think this is an ongoing issue with the current implementation of PyTorch. There should be better error or warning to understand this. I am not sure how to reproduce this on a public dataset. I tried but I couldnt.
I see. Let's keep the Issue open until we get some knowledge.