moviegeek icon indicating copy to clipboard operation
moviegeek copied to clipboard

Explicit zeros get ignored when calculating the overlap matrix.

Open ksiar137 opened this issue 11 months ago • 0 comments

In the following line, an overlap matrix is created by converting the coo matrix to boolean, then to integer. https://github.com/practical-recommender-systems/moviegeek/blob/d02d797f38abdee95eed2918debb1de3bdf35ed1/builder/item_similarity_calculator.py#L52

However, what this does is that it converts the ratings which are normalized to zero, to false values, which then get ignored in the count. My proposed solution: create a matrix with ones for every value of the coo matrix:

Example:

print("Coo matrix:\n", coo) print("coo as bool:\n",coo.astype(bool).astype(int)) ones_data = [1] * len(coo.data) ones_matrix = coo_matrix((ones_data, (coo.row, coo.col)), shape=coo.shape) print("ones matrix:\n",ones_matrix)

Output:

Coo matrix: (0, 0) -0.6666666666666667 (1, 0) 0.33333333333333326 (2, 0) 0.33333333333333326 (1, 1) 0.5 (2, 1) 0.0 (3, 1) -0.5 (1, 2) 0.0 (2, 2) 0.5 (3, 2) -0.5 coo as bool: (0, 0) 1 (1, 0) 1 (1, 1) 1 (1, 2) 0 (2, 0) 1 (2, 1) 0 (2, 2) 1 (3, 1) 1 (3, 2) 1 ones matrix: (0, 0) 1 (1, 0) 1 (2, 0) 1 (1, 1) 1 (2, 1) 1 (3, 1) 1 (1, 2) 1 (2, 2) 1 (3, 2) 1

ksiar137 avatar Mar 10 '24 18:03 ksiar137