tffm
tffm copied to clipboard
Scaling benchmarks
I've been looking at Spark implementations of Factorization Machines. I found that none of the existing open source implementations scale to a dataset with millions of features and hundreds of millions of examples. I'd be curious how this implementation is able to scale.
Hi @benmccann, I believe you should check https://github.com/dmlc/difacto -- from my point of view, it is the most scalable solution. Btw, FFM (https://www.csie.ntu.edu.tw/~cjlin/libffm/) is a good pure C++ implementation which I've been able to run on my laptop on dataset with ~10k features (25 non-zeros) and ~30kk samples
tffm is mostly for research purpose, so I don't expect really good scalability
@geffy @benmccann These days I was learning tensorflow, and developed a distributed factorization machine version. I customized some operators such that it has comparable performance with difacto. Welcome to take a look and give some suggestion :) Thanks.
https://github.com/kopopt/fast_tffm
I might be able to test; just need to convert crteo into tffm input format. Is there any reference for the input format ?