binge
binge copied to clipboard
Recommendation models that use binary rather than floating point operations at prediction time.
Binary Representations in Recommendations
This repository contains an implementation of recommendation models that use binary rather than floating point operations at prediction time. This makes them much faster (and less memory intensive), but also less accurate.
The details are in the paper and in the slides.
Implementation
The model is implemented in PyTorch for fitting, and C for prediction (using AVX2 SIMD operations).
Results
Dimension MRR Binary MRR MRR ratio PPMS Binary PPMS PPMS ratio Memory use ratio
32 0.077 0.059 0.768 87,758 264,146 3.010 0.091
64 0.075 0.066 0.878 45,992 220,833 4.802 0.062
128 0.076 0.071 0.935 23,870 166,252 6.965 0.047
256 0.078 0.071 0.908 12,284 190,131 15.477 0.039
512 0.080 0.073 0.912 6074 122,637 20.190 0.035
1024 0.080 0.075 0.936 3056 61,301 20.056 0.033
The table above summarises the results of comparing floating point and binary models on the Movielens 1M dataset. MRR (mean reciprocal rank) ratio denotes the fraction of the real-valued MRR the binary model achieves for that dimensionality; PPMS (predictions per millisecond) ratio denotes how many more predictions the binary model can compute per millisecond.
It appears that continous models retain good accuracy as latent dimensionality decreases, binary models' representational power sharply deteriorates. Moving from the 1024 to 32 dimensions in the continuous model implies a 29 times increase in prediction speed at the expense of a modest 4% decrease in accuracy. This compares favourably to switching to binary representations: moving to a 1024-binary dimensional representation implies a sharper accuracy drop at 6% in exchange for a smaller 20 times increase in prediction speed. Moving to 32 dimensions yields a further speed gains at 86 times, but at the cost of a considerable loss of accuracy at 26%.
The results suggest that latent recommender models have good representational power even at low latent dimensionalities. This advantage is lost when using binary embeddings, which need to be high-dimensional to achieve comparable accuracy. At the same time, binary representations' speed advantage is only evident at high dimensions. The attractiveness of binary representations is tightly coupled with high-dimensional models where a single prediction requires many floating point operations, such as convolutional neural networks. Relatively compact latent factor models do not fall into this category.