lightfm Crazy slow on inference for 6M users and 12M products

Hi Maciej!

Thank you for a great tool and a concise implementation of a WARP loss, this thing really helps to beat plan MF. But so far my problem is the prediction time, which is just unreasonably huge: using 10 machines from Google cloud, it might take around 22h to predict top 1k products among 12M products to 6.7M customers using 128 vector dimension.

Whereas training is just blazingly fast: some hour or two, not comparable to the prediction nightmare

I'm not using any side information, only users and items, MF style. WARP loss is applied during training. Tell me please how can I help you to figure out more details to resolve this problem. What could be so slow in the inference time?

Sincerely Alexey

Feb 02 '20 06:02 olddaos

I can offer two suggestions:

Use the get_user/get_item_representations methods and do prediction via matrix multiplies rather than the built-in prediction methods. This should be much more efficient.
Build an use an approximate nearest neighbours index.

Feb 03 '20 04:02 maciejkula

Do you have an example of doing predictions using matrix multiplication, without having to use predict() method? In my case I have user_features in my model.

Nov 16 '20 12:11 igorkf