LibRecommender
LibRecommender copied to clipboard
IndexError: index 3 is out of bounds for axis 0 with size 3
kindly look into error
File "data_info.py", line 390, in add_oov item_sparse_oov = self.sparse_oov[self.item_sparse_col.index]
IndexError: index 3 is out of bounds for axis 0 with size 3
i have change the lenth to none. but have no luck
Could you describe your data and paste your code here?
I'm getting the same error. It happens when I use multi_sparse_cols. Works fine without.
388 if (self.item_sparse_unique is not None and
389 len(self.item_sparse_unique) == self.n_items):
--> 390 item_sparse_oov = self.sparse_oov[self.item_sparse_col.index]
391 self.item_sparse_unique = np.vstack(
392 [self.item_sparse_unique, item_sparse_oov]
IndexError: index 55 is out of bounds for axis 0 with size 55
I have 15 user_cols, 63 item_cols, 50 sparse_cols, 0 dense_cols and 5 multi_sparse_cols.
After a quick scan of codes, I think there are some conflicts between sparse_oov and multi_sparse_cols implementation. This issue is a bit complicated and I need some time to fix it.
Fixed in version 0.6.6
. Also some of the usage on multi_sparse features has been changed, see the updated User Guide.
The same error raises when using Cesar model even without multi_sparse features. As far as I see, user_vector and item_vector dont have OOV inside.
Is there any suggestion?
Actually the original code has an exclusive function to assign OOV values and it's the assign_oov_vector
in line 281 of caser.py
. But I can see that you've changed a lot of source code, so maybe you accidentially removed this line?
calling assign_oov_vector
method exist at line 281 after the train loop, there is no missing line. The error is raised before calling assign_oov_vector
at line 324 of cesar.py
Sorry my mistake, I didn't realize that. So the problem is OOV values during evaluation. I think you can add assign_oov_vector
before evaluation in line 275.
# for evaluation
self._set_latent_factors()
self.print_metrics(eval_data=eval_data, metrics=metrics,
**kwargs)
to:
# for evaluation
self._set_latent_factors()
assign_oov_vector(self)
self.print_metrics(eval_data=eval_data, metrics=metrics,
**kwargs)
However, you may encounter many warnings about unknown interactions.
Thanks it is ok now. About warnings, I have discarded them.
But I did not understand why OOV is added during evaluation of val_data, whereas models like YoutubeMatch it is not the case.
I will notify you about perf comparison of Wide&Deep, DNNYoutube and Cesar models. In cesar paper, the authors claimed that it outperforms GRU4rec that is why I dont use Rnn4Rec, for now.
In line 379 of YouTubeMatch, the code adds OOV values of all zeros, so no exception occurs during evaluation. But now I think it's a bug. Because in line 309 I also assigned oov values, which makes it redundant. Since Caser
and YoutubeMatch
were not implementated at the same time, I might have overlooked these details.
On the other hand, I think OOV values should be excluded during evaluation cuz they are not trained. So this leads the question that how you split your data. If you use the functions in data/split.py
. By default it will remove all OOV users and items in eval_data, so I suppose you split the data in your own way?
Yes YouTuBeMatch
is a retrieval algorithm. But in module libreco
, it is actually "treated as" a ranking algorithm. In essence, retrieval and ranking algorithms can be used interchangeably. The reason that some of them are called retrieval algorithms is for their high inference speed, whereas ranking algorithms is for their accuracy. So the difference mainly comes from an engineering point of view.
For rating and ranking, I think their difference mainly comes from a task point of view. Since the former one deals with explicit data and the later one deals with implicit data.
Yes I am splitting on my own. As for OOV, it should just be added at evaluation, not required to be used at training, embedding vector size should be enough for existing sparse features at training loop both for train and val sets. Thank you