implicit
implicit copied to clipboard
Cross validation
I would like to tune hyper-parameters with implicit's AlternatingLeastSquares
. Ideally, I would use cross-validation but it seems like there is no simple way to "fit" on training data and "predict" on test data.
Any thoughts on how to handle this? Thanks!
I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval
With this branch you can do something like
from implicit.evaluation import precision_at_k, train_test_split
from implicit.als import AlternatingLeastSquares
from implicit.datasets.movielens import get_movielens
movies, ratings = get_movielens("1m")
train, test = train_test_split(ratings)
model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15)
model.fit(train)
p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)
You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week
👋 where can I find the eval
branch to test out the evaluation module?
EDIT: ok great I can see it here in this commit https://github.com/benfred/implicit/commit/861713e6cb4e65d7485abfab5e843d4872bf4bd1
@qjflores It's in master/pypi now - I'm just leaving this open because I need to add some documentation on these functions
Hey @benfred, thank you for such a great tool you've created!
Maybe you could help found answer why above code works like this:
100% - is model training progress;
but 49% is a P@K evaluation
@AFimin that doesn't look right - there should only be one progress par for the fit and one for the evaluation given the code above.
When running the code snippet above it should look something like:
In [1]: from implicit.evaluation import precision_at_k, train_test_split
...: from implicit.als import AlternatingLeastSquares
...: from implicit.datasets.movielens import get_movielens
...:
...: movies, ratings = get_movielens("1m")
...: train, test = train_test_split(ratings)
...:
...: model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15)
...: model.fit(train)
...:
...: p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)
...:
100%|█████████████████████████████████████████████████████████████████████████| 15.0/15 [00:02<00:00, 5.43it/s]
100%|████████████████████████████████████████████████████████████████████▉| 6036/6041 [00:01<00:00, 3208.17it/s]
Where the first progress bar is model fitting and the second is the evaluation.
I'm not sure why you are seeing repeated progress bars there - are you training/evaluating in a loop or using the 'fit_callback' functionality?
You can turn off these progress bars by passing in 'show_progress=False' to model.fit or precision_at_k if that helps.
For the evaluation progress bar not getting to 100% - I'm guessing it's because of these lines: https://github.com/benfred/implicit/blob/393de3f4e4a6b73eb051ed236a94272cabdfe548/implicit/evaluation.pyx#L97-L99
We're skipping evaluating the user if the user doesn't have any items in the test set - but right now the progress bar isn't getting incremented there. This could cause the progress bar to not hit 100% when doing evaluation : for your dataset do only 49% of users have items in the test set?
@benfred Thanks for so quick response, and excuse me for messing up a bit. Let me clarify.
- Repeated progress-bar its expected, I'm trying to do some cv for hyper parameter fitting.
- And yes, the question was that eval bar not hitting 100%, that caused me to think that im doing something wrong. Anyway, you've confirmed my assumptions, thank you again!
Awesome! glad it's not something really weird anyways =).
I've put in a fix here https://github.com/benfred/implicit/commit/363c9875c13146c9e50c07d8452a4bba55751aad . I think this means that the progress bars will hit 100% during xval even if there are users missing items in the test set.
@benfred What a great library!
I am using the precision_at_k to tune parameters of the model, including two parameters for my custom importance function, the regularization parameter, number of factors and number of iterations. I can see that while training is on the GPU, evaluation is on the CPU. Evaluation is about a factor of 5 slower, so most of the time is spent in evaluation. Is there any way to move the evaluation onto the GPU as well?
@Acey25 same issue here =
following this https://gist.github.com/jbochi/2e8ddcc5939e70e5368326aa034a144e and it doesn't work
@qjflores It's in master/pypi now - I'm just leaving this open because I need to add some documentation on these functions
Does this work in python 3.7?
@rituk I'm not sure I haven't tried
@benfred What a great library!
I am using the precision_at_k to tune parameters of the model, including two parameters for my custom importance function, the regularization parameter, number of factors and number of iterations. I can see that while training is on the GPU, evaluation is on the CPU. Evaluation is about a factor of 5 slower, so most of the time is spent in evaluation. Is there any way to move the evaluation onto the GPU as well?
I think this is the main bottleneck of ALS or any other matrix factorization based algorithms. Computing the map@k or other ranking metrics for the test set is very slow. On my dataset, training takes ~20s but evaluation on 1% of this data takes 10 minutes. Too slow for parameters tuning.
I don't know if it's possible but having the metrics available with GPU computation would be awesome.
I found a way to leverage GPU using the cupy library (cupy.arpartition) allowing faster computation of recommendation on test set. (thus faster map@k)
I found a way to leverage GPU using the cupy library (cupy.arpartition) allowing faster computation of recommendation on test set. (thus faster map@k) @Phildumoux Are you willing to share your find? I am running in to similar bottleneck. @benfred Any recommendatioins here? Speeding up metrics over GPU?
I found a way to leverage GPU using the cupy library (cupy.arpartition) allowing faster computation of recommendation on test set. (thus faster map@k)
Please do share details.
I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval
With this branch you can do something like
from implicit.evaluation import precision_at_k, train_test_split from implicit.als import AlternatingLeastSquares from implicit.datasets.movielens import get_movielens movies, ratings = get_movielens("1m") train, test = train_test_split(ratings) model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15) model.fit(train) p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)
You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week
I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval
With this branch you can do something like
from implicit.evaluation import precision_at_k, train_test_split from implicit.als import AlternatingLeastSquares from implicit.datasets.movielens import get_movielens movies, ratings = get_movielens("1m") train, test = train_test_split(ratings) model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15) model.fit(train) p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)
You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week
hi @benfred, How to get the recommendations for a user, if it was trained as user_item_rating?
Understanding that the recommendations of the initial guide are trained with the item_user_data matrix, while the ranking_metrics_at_k functions have train_user_items as input, and in the next line you will get the users and items cdef int users = test_user_items.shape [0], items = test_user_items.shape [1]
Thank you!
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
Hi @rituk, Could you give an example? Thank you!.
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
Hi @rituk, Could you give an example? Thank you!.
Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
Hi @rituk, Could you give an example? Thank you!.
Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py
I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected:
sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item'])))
np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75)
#Building the model modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)
n=5#Numero de top N recomendaciones
alpha_val = 40 #alpha_val = 1 data_conf = (user_item_train * alpha_val).astype('double')
#TESTEAMOS LOS DISTINTOS ALGORITMOS benchmark = []
Iterate over all algorithms
for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)
algorithm.user_factors
algorithm.item_factors
resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True)
benchmark.append(resultadosTotales)
pd.DataFrame(benchmark)
modeloSeleccionado=modelALS3
#Get Recommendations user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
Hi @rituk, Could you give an example? Thank you!.
Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py
I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected:
sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item'])))
np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75)
#Building the model modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)
n=5#Numero de top N recomendaciones
alpha_val = 40 #alpha_val = 1 data_conf = (user_item_train * alpha_val).astype('double')
#TESTEAMOS LOS DISTINTOS ALGORITMOS benchmark = []
Iterate over all algorithms
for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)
algorithm.user_factors algorithm.item_factors resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True) benchmark.append(resultadosTotales)
pd.DataFrame(benchmark)
modeloSeleccionado=modelALS3
#Get Recommendations user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)
I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call.
Example:
**for item, score in model.recommend(user_id_dict[username], df_weighted_T,
filter_items=reference_articles,filter_already_liked_items=1, N=10):**
where df_weighted_T is user, item matrix.
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
Hi @rituk, Could you give an example? Thank you!.
Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py
I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected: sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item']))) np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75) #Building the model modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) n=5#Numero de top N recomendaciones alpha_val = 40 #alpha_val = 1 data_conf = (user_item_train * alpha_val).astype('double') #TESTEAMOS LOS DISTINTOS ALGORITMOS benchmark = []
Iterate over all algorithms
for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)
algorithm.user_factors algorithm.item_factors resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True) benchmark.append(resultadosTotales)
pd.DataFrame(benchmark) modeloSeleccionado=modelALS3 #Get Recommendations user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)
I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. Example: **for item, score in model.recommend(user_id_dict[username], df_weighted_T, filter_items=reference_articles,filter_already_liked_items=1, N=10):** where df_weighted_T is user, item matrix.
What is the order of the columns of the df_weighted_T dataframe and what is the format of the training data (user-items or items-user)?
What is reference_articles? Thanks
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
Hi @rituk, Could you give an example? Thank you!.
Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py
I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected: sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item']))) np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75) #Building the model modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) n=5#Numero de top N recomendaciones alpha_val = 40 #alpha_val = 1 data_conf = (user_item_train * alpha_val).astype('double') #TESTEAMOS LOS DISTINTOS ALGORITMOS benchmark = []
Iterate over all algorithms
for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)
algorithm.user_factors algorithm.item_factors resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True) benchmark.append(resultadosTotales)
pd.DataFrame(benchmark) modeloSeleccionado=modelALS3 #Get Recommendations user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)
I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. Example: **for item, score in model.recommend(user_id_dict[username], df_weighted_T, filter_items=reference_articles,filter_already_liked_items=1, N=10):** where df_weighted_T is user, item matrix.
What is the order of the columns of the df_weighted_T dataframe and what is the format of the training data (user-items or items-user)?
Thanks
df_weighted_T= user-item. Model is trained with items-user, transposed to user-item to call model.recommend(user-item). I'll have to look up, i'm not near the code. See if his code helps.
https://github.com/benfred/implicit/blob/master/tests/recommender_base_test.py
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
Hi @rituk, Could you give an example? Thank you!.
Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py
I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected: sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item']))) np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75) #Building the model modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) n=5#Numero de top N recomendaciones alpha_val = 40 #alpha_val = 1 data_conf = (user_item_train * alpha_val).astype('double') #TESTEAMOS LOS DISTINTOS ALGORITMOS benchmark = []
Iterate over all algorithms
for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)
algorithm.user_factors algorithm.item_factors resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True) benchmark.append(resultadosTotales)
pd.DataFrame(benchmark) modeloSeleccionado=modelALS3 #Get Recommendations user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)
I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. Example: **for item, score in model.recommend(user_id_dict[username], df_weighted_T, filter_items=reference_articles,filter_already_liked_items=1, N=10):** where df_weighted_T is user, item matrix.
What is the order of the columns of the df_weighted_T dataframe and what is the format of the training data (user-items or items-user)? Thanks
df_weighted_T= user-item. Model is trained with items-user, transposed to user-item to call model.recommend(user-item). I'll have to look up, i'm not near the code.
Ok, but the problem is that the metric functions have as input the user_item matrix format, how can I adjust the parameters if the trained model is item_user?. Remember the ranking_metrics_at_k function have this line test_user_items.shape [0], items = test_user_items.shape [1] , therefore the metrics are reversed.
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
Hi @rituk, Could you give an example? Thank you!.
Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py
I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected: sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item']))) np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75) #Building the model modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) n=5#Numero de top N recomendaciones alpha_val = 40 #alpha_val = 1 data_conf = (user_item_train * alpha_val).astype('double') #TESTEAMOS LOS DISTINTOS ALGORITMOS benchmark = []
Iterate over all algorithms
for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)
algorithm.user_factors algorithm.item_factors resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True) benchmark.append(resultadosTotales)
pd.DataFrame(benchmark) modeloSeleccionado=modelALS3 #Get Recommendations user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)
I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. Example: **for item, score in model.recommend(user_id_dict[username], df_weighted_T, filter_items=reference_articles,filter_already_liked_items=1, N=10):** where df_weighted_T is user, item matrix.
What is the order of the columns of the df_weighted_T dataframe and what is the format of the training data (user-items or items-user)? Thanks
df_weighted_T= user-item. Model is trained with items-user, transposed to user-item to call model.recommend(user-item). I'll have to look up, i'm not near the code.
Ok, but the problem is that the metric functions have as input the user_item matrix format, how can I adjust the parameters if the trained model is item_user?. Remember the ranking_metrics_at_k function have this line test_user_items.shape [0], items = test_user_items.shape [1] , therefore the metrics are reversed.
Any parameters you want to adjust in the **model.recommend** call is basically a filter being applied to the matrix. I think
should be doable.
I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval
With this branch you can do something like
from implicit.evaluation import precision_at_k, train_test_split from implicit.als import AlternatingLeastSquares from implicit.datasets.movielens import get_movielens movies, ratings = get_movielens("1m") train, test = train_test_split(ratings) model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15) model.fit(train) p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)
You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week
@benfred the above code is throwing this error now: IndexError: index 3953 is out of bounds for axis 0 with size 3953
I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval With this branch you can do something like
from implicit.evaluation import precision_at_k, train_test_split from implicit.als import AlternatingLeastSquares from implicit.datasets.movielens import get_movielens movies, ratings = get_movielens("1m") train, test = train_test_split(ratings) model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15) model.fit(train) p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)
You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week
@benfred the above code is throwing this error now: IndexError: index 3953 is out of bounds for axis 0 with size 3953
@essefi-ahlem if you replace the last line as follows it will work
p = precision_at_k(model, train, test, K=10, num_threads=4)