implicit icon indicating copy to clipboard operation
implicit copied to clipboard

Recommendations for improving mean average precision? Anybody have advice?

Open soliverc opened this issue 5 years ago • 4 comments

My data is users, store, and median purchase. The median purchase amount is the "rating". Sample:

        user_id                shop             Median Purchase
0             0                  Ea                       6.00
1             0                  Netflix                10.99
2             1                  Three                  239.99
3             2                  MCDonalds        2.00
4             2                  Cafe                   15.95

I have created a sparse matrix:

sparse_item_user = sparse.csr_matrix((data['Median Purchase'].astype(float), (data['shop_id'], data['user_id'])))

<4016x35616 sparse matrix of type '<class 'numpy.float64'>'
	with 672643 stored elements in Compressed Sparse Row format>

Did a train test split

data = implicit.als.train_test_split(data,.7)

train,test = data[0],data[1]

And begun a gridsearch.

grid = ParameterGrid({
    "factors": [10,20,30,40,50,60,70,80,90,100,110,120,150,180,200,250,300],
    'regularization':[0.001,0.01,.1,1,10,20,30,40],
    'iterations':[10,20,30,40,50,60,70,80,90,100]
})

for params in grid:
    model = implicit.als.AlternatingLeastSquares(**params)
    model.fit(train,show_progress=False)
    map_ = implicit.evaluation.mean_average_precision_at_k(model, train_user_items = train, test_user_items = test, K=5,num_threads =0)
    print(map_)

However the map is extremely low for basically all of them. Small sample of map:

0.0006565014824227022,. 
 0.0004853875476493011,
 0.0004701397712833545,
 0.0005438373570520967,
 0.0004574332909783991,

Does anybody have a recommended checklist of what I can do to increase the map?

Could it be "noise" in my data? If so, where should I look?

Should I remove extremely popular shops in my data? Where basically 80% of the customers have purchased something.

Should I remove shops with only 1 customer, or customers with only 1 purchase?

Since parameter tuning does not seem to be working, it must be a problem with the data.

Any advice is appreciated.

soliverc avatar Feb 04 '20 11:02 soliverc

Hi @soliverc, sorry I cannot give you any advice because I face the same problem. Have you tried to use the TFIDFRecommender? How does it compare to ALS? In my case the TFIDF is way better than ALS and I am unsure what to do to solve it.

thisisjl avatar Feb 04 '20 14:02 thisisjl

I didn't know about TFIDFRecommender. It's not in the docs. Is there a user guide somewhere? I'd like to try it.

If the TFIDF is better than ALS, why do you want to use ALS?

soliverc avatar Feb 04 '20 14:02 soliverc

You can find the TFIDFRecommender here. This algorithm is not doing a matrix factorization, so I am not obtaining user/item factors, which is something I want. ALS is a matrix factorization and it does provide these factors, that is why I am using it.

thisisjl avatar Feb 04 '20 14:02 thisisjl

@soliverc - did you you weight your purchases matrix? You can boost the signal of the purchases which may make the model perform better.

See lastfm() example here (line 72 ): https://github.com/benfred/implicit/blob/master/examples/lastfm.py

# if we're training an ALS based model, weight input for last.fm
    # by bm25
    if model_name.endswith("als"):
        # lets weight these models by bm25weight.
        logging.debug("weighting matrix by bm25_weight")
        plays = bm25_weight(plays, K1=100, B=0.8)

kylemcmearty avatar Oct 13 '20 20:10 kylemcmearty