implicit icon indicating copy to clipboard operation
implicit copied to clipboard

Cross validation

Open topspinj opened this issue 6 years ago • 27 comments

I would like to tune hyper-parameters with implicit's AlternatingLeastSquares. Ideally, I would use cross-validation but it seems like there is no simple way to "fit" on training data and "predict" on test data.

Any thoughts on how to handle this? Thanks!

topspinj avatar May 23 '18 18:05 topspinj

I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval

With this branch you can do something like

from implicit.evaluation import precision_at_k, train_test_split
from implicit.als import AlternatingLeastSquares
from implicit.datasets.movielens import get_movielens

movies, ratings = get_movielens("1m")
train, test = train_test_split(ratings)

model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15)
model.fit(train)

p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)

You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week

benfred avatar May 23 '18 18:05 benfred

👋 where can I find the eval branch to test out the evaluation module?

EDIT: ok great I can see it here in this commit https://github.com/benfred/implicit/commit/861713e6cb4e65d7485abfab5e843d4872bf4bd1

qjflores avatar Jul 24 '18 16:07 qjflores

@qjflores It's in master/pypi now - I'm just leaving this open because I need to add some documentation on these functions

benfred avatar Jul 25 '18 14:07 benfred

Hey @benfred, thank you for such a great tool you've created! Maybe you could help found answer why above code works like this: screen shot 2018-09-24 at 22 33 12 100% - is model training progress; but 49% is a P@K evaluation

AFimin avatar Sep 24 '18 19:09 AFimin

@AFimin that doesn't look right - there should only be one progress par for the fit and one for the evaluation given the code above.

When running the code snippet above it should look something like:

In [1]: from implicit.evaluation import precision_at_k, train_test_split
   ...: from implicit.als import AlternatingLeastSquares
   ...: from implicit.datasets.movielens import get_movielens
   ...: 
   ...: movies, ratings = get_movielens("1m")
   ...: train, test = train_test_split(ratings)
   ...: 
   ...: model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15)
   ...: model.fit(train)
   ...: 
   ...: p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)
   ...: 

100%|█████████████████████████████████████████████████████████████████████████| 15.0/15 [00:02<00:00,  5.43it/s]
100%|████████████████████████████████████████████████████████████████████▉| 6036/6041 [00:01<00:00, 3208.17it/s]

Where the first progress bar is model fitting and the second is the evaluation.

I'm not sure why you are seeing repeated progress bars there - are you training/evaluating in a loop or using the 'fit_callback' functionality?

You can turn off these progress bars by passing in 'show_progress=False' to model.fit or precision_at_k if that helps.

For the evaluation progress bar not getting to 100% - I'm guessing it's because of these lines: https://github.com/benfred/implicit/blob/393de3f4e4a6b73eb051ed236a94272cabdfe548/implicit/evaluation.pyx#L97-L99

We're skipping evaluating the user if the user doesn't have any items in the test set - but right now the progress bar isn't getting incremented there. This could cause the progress bar to not hit 100% when doing evaluation : for your dataset do only 49% of users have items in the test set?

benfred avatar Sep 24 '18 20:09 benfred

@benfred Thanks for so quick response, and excuse me for messing up a bit. Let me clarify.

  1. Repeated progress-bar its expected, I'm trying to do some cv for hyper parameter fitting.
  2. And yes, the question was that eval bar not hitting 100%, that caused me to think that im doing something wrong. Anyway, you've confirmed my assumptions, thank you again!

AFimin avatar Sep 24 '18 20:09 AFimin

Awesome! glad it's not something really weird anyways =).

I've put in a fix here https://github.com/benfred/implicit/commit/363c9875c13146c9e50c07d8452a4bba55751aad . I think this means that the progress bars will hit 100% during xval even if there are users missing items in the test set.

benfred avatar Sep 24 '18 20:09 benfred

@benfred What a great library!

I am using the precision_at_k to tune parameters of the model, including two parameters for my custom importance function, the regularization parameter, number of factors and number of iterations. I can see that while training is on the GPU, evaluation is on the CPU. Evaluation is about a factor of 5 slower, so most of the time is spent in evaluation. Is there any way to move the evaluation onto the GPU as well?

civilinformer avatar Oct 11 '18 15:10 civilinformer

@Acey25 same issue here =
following this https://gist.github.com/jbochi/2e8ddcc5939e70e5368326aa034a144e and it doesn't work

ifokeev avatar May 10 '19 20:05 ifokeev

@qjflores It's in master/pypi now - I'm just leaving this open because I need to add some documentation on these functions

Does this work in python 3.7?

rituk avatar May 28 '19 21:05 rituk

@rituk I'm not sure I haven't tried

qjflores avatar May 28 '19 21:05 qjflores

@benfred What a great library!

I am using the precision_at_k to tune parameters of the model, including two parameters for my custom importance function, the regularization parameter, number of factors and number of iterations. I can see that while training is on the GPU, evaluation is on the CPU. Evaluation is about a factor of 5 slower, so most of the time is spent in evaluation. Is there any way to move the evaluation onto the GPU as well?

I think this is the main bottleneck of ALS or any other matrix factorization based algorithms. Computing the map@k or other ranking metrics for the test set is very slow. On my dataset, training takes ~20s but evaluation on 1% of this data takes 10 minutes. Too slow for parameters tuning.

I don't know if it's possible but having the metrics available with GPU computation would be awesome.

I found a way to leverage GPU using the cupy library (cupy.arpartition) allowing faster computation of recommendation on test set. (thus faster map@k)

I found a way to leverage GPU using the cupy library (cupy.arpartition) allowing faster computation of recommendation on test set. (thus faster map@k) @Phildumoux Are you willing to share your find? I am running in to similar bottleneck. @benfred Any recommendatioins here? Speeding up metrics over GPU?

SheldonGrant avatar Aug 28 '20 14:08 SheldonGrant

I found a way to leverage GPU using the cupy library (cupy.arpartition) allowing faster computation of recommendation on test set. (thus faster map@k)

Please do share details.

rituk avatar Aug 28 '20 14:08 rituk

I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval

With this branch you can do something like

from implicit.evaluation import precision_at_k, train_test_split
from implicit.als import AlternatingLeastSquares
from implicit.datasets.movielens import get_movielens

movies, ratings = get_movielens("1m")
train, test = train_test_split(ratings)

model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15)
model.fit(train)

p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)

You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week

I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval

With this branch you can do something like

from implicit.evaluation import precision_at_k, train_test_split
from implicit.als import AlternatingLeastSquares
from implicit.datasets.movielens import get_movielens

movies, ratings = get_movielens("1m")
train, test = train_test_split(ratings)

model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15)
model.fit(train)

p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)

You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week

hi @benfred, How to get the recommendations for a user, if it was trained as user_item_rating?

Understanding that the recommendations of the initial guide are trained with the item_user_data matrix, while the ranking_metrics_at_k functions have train_user_items as input, and in the next line you will get the users and items cdef int users = test_user_items.shape [0], items = test_user_items.shape [1]

Thank you!

jselma avatar Dec 05 '20 16:12 jselma

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

rituk avatar Dec 05 '20 16:12 rituk

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

Hi @rituk, Could you give an example? Thank you!.

jselma avatar Dec 05 '20 18:12 jselma

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

Hi @rituk, Could you give an example? Thank you!.

Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py

rituk avatar Dec 05 '20 18:12 rituk

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

Hi @rituk, Could you give an example? Thank you!.

Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py

I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected:

sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item'])))

np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75)

#Building the model modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)

n=5#Numero de top N recomendaciones

alpha_val = 40 #alpha_val = 1 data_conf = (user_item_train * alpha_val).astype('double')

#TESTEAMOS LOS DISTINTOS ALGORITMOS benchmark = []

Iterate over all algorithms

for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)

algorithm.user_factors
algorithm.item_factors

resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True)

benchmark.append(resultadosTotales)

pd.DataFrame(benchmark)

modeloSeleccionado=modelALS3

#Get Recommendations user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)

jselma avatar Dec 05 '20 19:12 jselma

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

Hi @rituk, Could you give an example? Thank you!.

Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py

I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected:

sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item'])))

np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75)

#Building the model modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)

n=5#Numero de top N recomendaciones

alpha_val = 40 #alpha_val = 1 data_conf = (user_item_train * alpha_val).astype('double')

#TESTEAMOS LOS DISTINTOS ALGORITMOS benchmark = []

Iterate over all algorithms

for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)

algorithm.user_factors
algorithm.item_factors

resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True)

benchmark.append(resultadosTotales)

pd.DataFrame(benchmark)

modeloSeleccionado=modelALS3

#Get Recommendations user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)

 I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. 
 Example: 
 **for item, score in model.recommend(user_id_dict[username], df_weighted_T, 
                               filter_items=reference_articles,filter_already_liked_items=1, N=10):**   

    where df_weighted_T is user, item matrix.  

rituk avatar Dec 05 '20 19:12 rituk

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

Hi @rituk, Could you give an example? Thank you!.

Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py

I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected: sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item']))) np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75) #Building the model modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) n=5#Numero de top N recomendaciones alpha_val = 40 #alpha_val = 1 data_conf = (user_item_train * alpha_val).astype('double') #TESTEAMOS LOS DISTINTOS ALGORITMOS benchmark = []

Iterate over all algorithms

for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)

algorithm.user_factors
algorithm.item_factors

resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True)

benchmark.append(resultadosTotales)

pd.DataFrame(benchmark) modeloSeleccionado=modelALS3 #Get Recommendations user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)

 I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. 
 Example: 
 **for item, score in model.recommend(user_id_dict[username], df_weighted_T, 
                               filter_items=reference_articles,filter_already_liked_items=1, N=10):**   

    where df_weighted_T is user, item matrix.  

What is the order of the columns of the df_weighted_T dataframe and what is the format of the training data (user-items or items-user)?

What is reference_articles? Thanks

jselma avatar Dec 05 '20 19:12 jselma

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

Hi @rituk, Could you give an example? Thank you!.

Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py

I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected: sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item']))) np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75) #Building the model modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) n=5#Numero de top N recomendaciones alpha_val = 40 #alpha_val = 1 data_conf = (user_item_train * alpha_val).astype('double') #TESTEAMOS LOS DISTINTOS ALGORITMOS benchmark = []

Iterate over all algorithms

for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)

algorithm.user_factors
algorithm.item_factors

resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True)

benchmark.append(resultadosTotales)

pd.DataFrame(benchmark) modeloSeleccionado=modelALS3 #Get Recommendations user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)

 I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. 
 Example: 
 **for item, score in model.recommend(user_id_dict[username], df_weighted_T, 
                               filter_items=reference_articles,filter_already_liked_items=1, N=10):**   

    where df_weighted_T is user, item matrix.  

What is the order of the columns of the df_weighted_T dataframe and what is the format of the training data (user-items or items-user)?

Thanks

df_weighted_T= user-item. Model is trained with items-user, transposed to user-item to call model.recommend(user-item). I'll have to look up, i'm not near the code. See if his code helps.

https://github.com/benfred/implicit/blob/master/tests/recommender_base_test.py

rituk avatar Dec 05 '20 19:12 rituk

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

Hi @rituk, Could you give an example? Thank you!.

Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py

I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected: sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item']))) np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75) #Building the model modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) n=5#Numero de top N recomendaciones alpha_val = 40 #alpha_val = 1 data_conf = (user_item_train * alpha_val).astype('double') #TESTEAMOS LOS DISTINTOS ALGORITMOS benchmark = []

Iterate over all algorithms

for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)

algorithm.user_factors
algorithm.item_factors

resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True)

benchmark.append(resultadosTotales)

pd.DataFrame(benchmark) modeloSeleccionado=modelALS3 #Get Recommendations user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)

 I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. 
 Example: 
 **for item, score in model.recommend(user_id_dict[username], df_weighted_T, 
                               filter_items=reference_articles,filter_already_liked_items=1, N=10):**   

    where df_weighted_T is user, item matrix.  

What is the order of the columns of the df_weighted_T dataframe and what is the format of the training data (user-items or items-user)? Thanks

df_weighted_T= user-item. Model is trained with items-user, transposed to user-item to call model.recommend(user-item). I'll have to look up, i'm not near the code.

Ok, but the problem is that the metric functions have as input the user_item matrix format, how can I adjust the parameters if the trained model is item_user?. Remember the ranking_metrics_at_k function have this line test_user_items.shape [0], items = test_user_items.shape [1] , therefore the metrics are reversed.

jselma avatar Dec 05 '20 19:12 jselma

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

Hi @rituk, Could you give an example? Thank you!.

Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py

I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected: sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item']))) np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75) #Building the model modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) n=5#Numero de top N recomendaciones alpha_val = 40 #alpha_val = 1 data_conf = (user_item_train * alpha_val).astype('double') #TESTEAMOS LOS DISTINTOS ALGORITMOS benchmark = []

Iterate over all algorithms

for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)

algorithm.user_factors
algorithm.item_factors

resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True)

benchmark.append(resultadosTotales)

pd.DataFrame(benchmark) modeloSeleccionado=modelALS3 #Get Recommendations user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)

 I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. 
 Example: 
 **for item, score in model.recommend(user_id_dict[username], df_weighted_T, 
                               filter_items=reference_articles,filter_already_liked_items=1, N=10):**   

    where df_weighted_T is user, item matrix.  

What is the order of the columns of the df_weighted_T dataframe and what is the format of the training data (user-items or items-user)? Thanks

df_weighted_T= user-item. Model is trained with items-user, transposed to user-item to call model.recommend(user-item). I'll have to look up, i'm not near the code.

Ok, but the problem is that the metric functions have as input the user_item matrix format, how can I adjust the parameters if the trained model is item_user?. Remember the ranking_metrics_at_k function have this line test_user_items.shape [0], items = test_user_items.shape [1] , therefore the metrics are reversed.

Any parameters you want to adjust in the **model.recommend** call is basically a filter being applied to the matrix. I think 
should be doable. 

rituk avatar Dec 05 '20 20:12 rituk

I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval

With this branch you can do something like

from implicit.evaluation import precision_at_k, train_test_split
from implicit.als import AlternatingLeastSquares
from implicit.datasets.movielens import get_movielens

movies, ratings = get_movielens("1m")
train, test = train_test_split(ratings)

model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15)
model.fit(train)

p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)

You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week

@benfred the above code is throwing this error now: IndexError: index 3953 is out of bounds for axis 0 with size 3953

essefi-ahlem avatar Mar 29 '23 10:03 essefi-ahlem

I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval With this branch you can do something like

from implicit.evaluation import precision_at_k, train_test_split
from implicit.als import AlternatingLeastSquares
from implicit.datasets.movielens import get_movielens

movies, ratings = get_movielens("1m")
train, test = train_test_split(ratings)

model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15)
model.fit(train)

p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)

You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week

@benfred the above code is throwing this error now: IndexError: index 3953 is out of bounds for axis 0 with size 3953

@essefi-ahlem if you replace the last line as follows it will work p = precision_at_k(model, train, test, K=10, num_threads=4)

tonyjward avatar Jun 02 '23 11:06 tonyjward