implicit icon indicating copy to clipboard operation
implicit copied to clipboard

Error `AttributeError: 'implicit.evaluation._memoryviewslice' object has no attribute 'dtype'` when calling `mean_average_precision_at_k` function

Open MRossa157 opened this issue 1 year ago • 6 comments

Hello! I encountered an issue when using the mean_average_precision_at_k function from the implicit library.

Problem Description:

When calling the mean_average_precision_at_k function, the following error occurs:

AttributeError: 'implicit.evaluation._memoryviewslice' object has no attribute 'dtype'

Context:

  • Operating System: Windows 10
  • Python: 3.10.7
  • implicit library version: 0.7.2
  • Installed dependencies:
    [tool.poetry.dependencies]
    python = "^3.10.7"
    pandas = "^2.2.2"
    implicit = "^0.7.2"
    

Steps to Reproduce:

  1. Installed the implicit library version 0.7.2.
  2. Called the mean_average_precision_at_k function with the following parameters:
    metric_map = mean_average_precision_at_k(
        model,
        csr_train,
        csr_test,
        K=6,
        show_progress=True,
    )
    
  3. Encountered the above-mentioned error.

Expected Behavior:

The function should return the MAP@K metric value without errors.

Additional Information:

  • Tried reinstalling the library and its dependencies, but the error persists.
  • The code includes the following imports:
    import numpy as np
    import pandas as pd
    from implicit.cpu.als import AlternatingLeastSquares as ALScpu
    from implicit.evaluation import mean_average_precision_at_k
    from scipy.sparse import coo_matrix
    

I would appreciate any assistance in resolving this issue.

MRossa157 avatar Jan 07 '25 12:01 MRossa157

Hi @MRossa157 ! I have trained model with python implicit package and faced the same problem:

The minimum example to reproduce the error

import os
import random
import pandas as pd
from scipy.sparse import csr_matrix
from implicit.evaluation import train_test_split, ndcg_at_k, mean_average_precision_at_k
from implicit.gpu.als import AlternatingLeastSquares

os.environ['OPENBLAS_NUM_THREADS']="1"
os.environ['CUDA_VISIBLE_DEVICES']="0"

# init random data
n_actions = 100000
max_uid = 100000
max_action_id = 10000

df = pd.DataFrame(data={
    "user_id" : [random.randint(1, max_uid) for i in range(0, n_actions)],
    "action" : [random.randint(1, max_action_id) for i in range(0, n_actions)],
    "impression" : [1 for i in range(0, n_actions)]
})

# convert to sparse format
user_rows = [uid for uid in df.user_id.tolist()]
query_cols = [st for st in df.action.tolist()]
qvecs = csr_matrix((df.impression, (user_rows, query_cols)))

# train test split and model training
train_user_items, test_user_items = train_test_split(qvecs, train_percentage=0.9, random_state=19)

model = AlternatingLeastSquares(factors=130, regularization=0.05, alpha=1.0, calculate_training_loss=True)
model.fit(train_user_items)

# calculate ndcg
ndcg = ndcg_at_k(model, train_user_items, test_user_items, K=14, show_progress=True, num_threads=1)

packages version: implicit-0.7.2 (built from source) python-3.11.2 cuda-12.3

os: Debian GNU/Linux 12

fkurushin avatar Jan 10 '25 10:01 fkurushin

Updating to scipy 1.14.1 should resolve the issue.

sorlandet avatar Jan 20 '25 08:01 sorlandet

Updating to scipy 1.14.1 should resolve the issue.

It does not in the wheels, at least on my side. Did you compile from scratch?

gdragotto avatar Feb 03 '25 20:02 gdragotto

Update: Python workaround to perform the evaluation "manually"

def ranking_metrics_at_k(model, train_user_items, test_user_items, K=10, show_progress=True):
    """
    Calculates ranking metrics (Precision@K, MAP@K, NDCG@K, AUC) for a trained model.

    Parameters:
        model : Trained ALS model (or other Implicit model).
        train_user_items : csr_matrix
            User-item interaction matrix used for training.
        test_user_items : csr_matrix
            User-item interaction matrix for evaluation.
        K : int
            Number of items to evaluate.
        show_progress : bool
            Show a progress bar during evaluation.

    Returns:
        dict : Dictionary with precision, MAP, NDCG, and AUC scores.
    """

    # Ensure matrices are in CSR format
    train_user_items = train_user_items.tocsr()
    test_user_items = test_user_items.tocsr()

    num_users, num_items = test_user_items.shape
    relevant = 0
    total_precision_div = 0
    total_map = 0
    total_ndcg = 0
    total_auc = 0
    total_users = 0

    # Compute cumulative gain for NDCG normalization
    cg = 1.0 / np.log2(np.arange(2, K + 2))  # Discount factor
    cg_sum = np.cumsum(cg)  # Ideal DCG normalization

    # Get users with at least one item in the test set
    users_with_test_data = np.where(np.diff(test_user_items.indptr) > 0)[0]

    # Progress bar
    progress = tqdm.tqdm(total=len(users_with_test_data), disable=not show_progress)

    batch_size = 1000
    start_idx = 0

    while start_idx < len(users_with_test_data):
        batch_users = users_with_test_data[start_idx:start_idx + batch_size]
        recommended_items, _ = model.recommend(batch_users, train_user_items[batch_users], N=K)
        start_idx += batch_size

        for user_idx, user_id in enumerate(batch_users):
            test_items = set(test_user_items.indices[test_user_items.indptr[user_id]:test_user_items.indptr[user_id + 1]])
            
            if not test_items:
                continue  # Skip users without test data

            num_relevant = len(test_items)
            total_precision_div += min(K, num_relevant)

            ap = 0
            hit_count = 0
            auc = 0
            idcg = cg_sum[min(K, num_relevant) - 1]  # Ideal Discounted Cumulative Gain (IDCG)
            num_negative = num_items - num_relevant

            for rank, item in enumerate(recommended_items[user_idx]):
                if item in test_items:
                    relevant += 1
                    hit_count += 1
                    ap += hit_count / (rank + 1)
                    total_ndcg += cg[rank] / idcg
                else:
                    auc += hit_count  # Accumulate hits for AUC calculation

            auc += ((hit_count + num_relevant) / 2.0) * (num_negative - (K - hit_count))
            total_map += ap / min(K, num_relevant)
            total_auc += auc / (num_relevant * num_negative)
            total_users += 1
        
        progress.update(len(batch_users))

    progress.close()

    # Compute final metrics
    precision = relevant / total_precision_div if total_precision_div > 0 else 0
    mean_ap = total_map / total_users if total_users > 0 else 0
    mean_ndcg = total_ndcg / total_users if total_users > 0 else 0
    mean_auc = total_auc / total_users if total_users > 0 else 0

    return {
        "precision": precision,
        "map": mean_ap,
        "ndcg": mean_ndcg,
        "auc": mean_auc
    }

gdragotto avatar Feb 03 '25 20:02 gdragotto

Installing the whole 'implicit' module using conda-forge instead of pip solved the issue for me. I wasn't able to run any evaluation functions getting the same error, now I can run any of them.

Just use "conda install -c conda-forge implicit"

InputMismatchError avatar Jun 24 '25 10:06 InputMismatchError

Updating to scipy 1.14.1 should resolve the issue.

I installed implicit by uv add implicit, by default it installed implicit 0.7.2 and scipy with a higher version 1.15.2. Then I downgraded scipy to 1.14.1 with uv add 'scipy==1.14.1' and it worked.

sillykelvin avatar Oct 30 '25 13:10 sillykelvin