implicit
implicit copied to clipboard
Recommendations for improving mean average precision? Anybody have advice?
My data is users, store, and median purchase. The median purchase amount is the "rating". Sample:
user_id shop Median Purchase
0 0 Ea 6.00
1 0 Netflix 10.99
2 1 Three 239.99
3 2 MCDonalds 2.00
4 2 Cafe 15.95
I have created a sparse matrix:
sparse_item_user = sparse.csr_matrix((data['Median Purchase'].astype(float), (data['shop_id'], data['user_id'])))
<4016x35616 sparse matrix of type '<class 'numpy.float64'>'
with 672643 stored elements in Compressed Sparse Row format>
Did a train test split
data = implicit.als.train_test_split(data,.7)
train,test = data[0],data[1]
And begun a gridsearch.
grid = ParameterGrid({
"factors": [10,20,30,40,50,60,70,80,90,100,110,120,150,180,200,250,300],
'regularization':[0.001,0.01,.1,1,10,20,30,40],
'iterations':[10,20,30,40,50,60,70,80,90,100]
})
for params in grid:
model = implicit.als.AlternatingLeastSquares(**params)
model.fit(train,show_progress=False)
map_ = implicit.evaluation.mean_average_precision_at_k(model, train_user_items = train, test_user_items = test, K=5,num_threads =0)
print(map_)
However the map
is extremely low for basically all of them. Small sample of map
:
0.0006565014824227022,.
0.0004853875476493011,
0.0004701397712833545,
0.0005438373570520967,
0.0004574332909783991,
Does anybody have a recommended checklist of what I can do to increase the map
?
Could it be "noise" in my data? If so, where should I look?
Should I remove extremely popular shops in my data? Where basically 80% of the customers have purchased something.
Should I remove shops with only 1 customer, or customers with only 1 purchase?
Since parameter tuning does not seem to be working, it must be a problem with the data.
Any advice is appreciated.
Hi @soliverc, sorry I cannot give you any advice because I face the same problem. Have you tried to use the TFIDFRecommender? How does it compare to ALS? In my case the TFIDF is way better than ALS and I am unsure what to do to solve it.
I didn't know about TFIDFRecommender. It's not in the docs. Is there a user guide somewhere? I'd like to try it.
If the TFIDF is better than ALS, why do you want to use ALS?
You can find the TFIDFRecommender here. This algorithm is not doing a matrix factorization, so I am not obtaining user/item factors, which is something I want. ALS is a matrix factorization and it does provide these factors, that is why I am using it.
@soliverc - did you you weight your purchases matrix? You can boost the signal of the purchases which may make the model perform better.
See lastfm() example here (line 72 ): https://github.com/benfred/implicit/blob/master/examples/lastfm.py
# if we're training an ALS based model, weight input for last.fm
# by bm25
if model_name.endswith("als"):
# lets weight these models by bm25weight.
logging.debug("weighting matrix by bm25_weight")
plays = bm25_weight(plays, K1=100, B=0.8)