LibRecommender IndexError: index 3 is out of bounds for axis 0 with size 3

kindly look into error

File "data_info.py", line 390, in add_oov item_sparse_oov = self.sparse_oov[self.item_sparse_col.index]

IndexError: index 3 is out of bounds for axis 0 with size 3

i have change the lenth to none. but have no luck

Mar 30 '21 06:03 FasihaIkram

Could you describe your data and paste your code here?

Mar 31 '21 07:03 massquantity

I'm getting the same error. It happens when I use multi_sparse_cols. Works fine without.

    388         if (self.item_sparse_unique is not None and
    389                 len(self.item_sparse_unique) == self.n_items):
--> 390             item_sparse_oov = self.sparse_oov[self.item_sparse_col.index]
    391             self.item_sparse_unique = np.vstack(
    392                 [self.item_sparse_unique, item_sparse_oov]

IndexError: index 55 is out of bounds for axis 0 with size 55

I have 15 user_cols, 63 item_cols, 50 sparse_cols, 0 dense_cols and 5 multi_sparse_cols.

May 14 '21 08:05 Braffolk

After a quick scan of codes, I think there are some conflicts between sparse_oov and multi_sparse_cols implementation. This issue is a bit complicated and I need some time to fix it.

May 14 '21 15:05 massquantity

Fixed in version 0.6.6. Also some of the usage on multi_sparse features has been changed, see the updated User Guide.

May 24 '21 03:05 massquantity

The same error raises when using Cesar model even without multi_sparse features. As far as I see, user_vector and item_vector dont have OOV inside.

Is there any suggestion?

Aug 09 '21 09:08 apdullahyayik

Actually the original code has an exclusive function to assign OOV values and it's the assign_oov_vector in line 281 of caser.py. But I can see that you've changed a lot of source code, so maybe you accidentially removed this line?

Aug 11 '21 11:08 massquantity

calling assign_oov_vector method exist at line 281 after the train loop, there is no missing line. The error is raised before calling assign_oov_vector at line 324 of cesar.py

Aug 11 '21 18:08 apdullahyayik

Sorry my mistake, I didn't realize that. So the problem is OOV values during evaluation. I think you can add assign_oov_vector before evaluation in line 275.

    # for evaluation
    self._set_latent_factors()
    self.print_metrics(eval_data=eval_data, metrics=metrics,
                       **kwargs)

to:

    # for evaluation
    self._set_latent_factors()
    assign_oov_vector(self)
    self.print_metrics(eval_data=eval_data, metrics=metrics,
                       **kwargs)

However, you may encounter many warnings about unknown interactions.

Aug 12 '21 00:08 massquantity

Thanks it is ok now. About warnings, I have discarded them.

But I did not understand why OOV is added during evaluation of val_data, whereas models like YoutubeMatch it is not the case.

I will notify you about perf comparison of Wide&Deep, DNNYoutube and Cesar models. In cesar paper, the authors claimed that it outperforms GRU4rec that is why I dont use Rnn4Rec, for now.

Aug 12 '21 06:08 apdullahyayik

In line 379 of YouTubeMatch, the code adds OOV values of all zeros, so no exception occurs during evaluation. But now I think it's a bug. Because in line 309 I also assigned oov values, which makes it redundant. Since Caser and YoutubeMatch were not implementated at the same time, I might have overlooked these details.

On the other hand, I think OOV values should be excluded during evaluation cuz they are not trained. So this leads the question that how you split your data. If you use the functions in data/split.py. By default it will remove all OOV users and items in eval_data, so I suppose you split the data in your own way?

Aug 13 '21 15:08 massquantity

Yes YouTuBeMatch is a retrieval algorithm. But in module libreco, it is actually "treated as" a ranking algorithm. In essence, retrieval and ranking algorithms can be used interchangeably. The reason that some of them are called retrieval algorithms is for their high inference speed, whereas ranking algorithms is for their accuracy. So the difference mainly comes from an engineering point of view.

For rating and ranking, I think their difference mainly comes from a task point of view. Since the former one deals with explicit data and the later one deals with implicit data.

Aug 13 '21 16:08 massquantity

Yes I am splitting on my own. As for OOV, it should just be added at evaluation, not required to be used at training, embedding vector size should be enough for existing sparse features at training loop both for train and val sets. Thank you

Aug 15 '21 13:08 apdullahyayik

LibRecommender LibRecommender copied to clipboard

IndexError: index 3 is out of bounds for axis 0 with size 3

LibRecommender
LibRecommender copied to clipboard