moviegeek
moviegeek copied to clipboard
Explicit zeros get ignored when calculating the overlap matrix.
In the following line, an overlap matrix is created by converting the coo matrix to boolean, then to integer. https://github.com/practical-recommender-systems/moviegeek/blob/d02d797f38abdee95eed2918debb1de3bdf35ed1/builder/item_similarity_calculator.py#L52
However, what this does is that it converts the ratings which are normalized to zero, to false values, which then get ignored in the count. My proposed solution: create a matrix with ones for every value of the coo matrix:
Example:
print("Coo matrix:\n", coo) print("coo as bool:\n",coo.astype(bool).astype(int)) ones_data = [1] * len(coo.data) ones_matrix = coo_matrix((ones_data, (coo.row, coo.col)), shape=coo.shape) print("ones matrix:\n",ones_matrix)
Output:
Coo matrix: (0, 0) -0.6666666666666667 (1, 0) 0.33333333333333326 (2, 0) 0.33333333333333326 (1, 1) 0.5 (2, 1) 0.0 (3, 1) -0.5 (1, 2) 0.0 (2, 2) 0.5 (3, 2) -0.5 coo as bool: (0, 0) 1 (1, 0) 1 (1, 1) 1 (1, 2) 0 (2, 0) 1 (2, 1) 0 (2, 2) 1 (3, 1) 1 (3, 2) 1 ones matrix: (0, 0) 1 (1, 0) 1 (2, 0) 1 (1, 1) 1 (2, 1) 1 (3, 1) 1 (1, 2) 1 (2, 2) 1 (3, 2) 1