tffm icon indicating copy to clipboard operation
tffm copied to clipboard

Scaling benchmarks

Open benmccann opened this issue 7 years ago • 3 comments

I've been looking at Spark implementations of Factorization Machines. I found that none of the existing open source implementations scale to a dataset with millions of features and hundreds of millions of examples. I'd be curious how this implementation is able to scale.

benmccann avatar Sep 13 '16 23:09 benmccann

Hi @benmccann, I believe you should check https://github.com/dmlc/difacto -- from my point of view, it is the most scalable solution. Btw, FFM (https://www.csie.ntu.edu.tw/~cjlin/libffm/) is a good pure C++ implementation which I've been able to run on my laptop on dataset with ~10k features (25 non-zeros) and ~30kk samples

tffm is mostly for research purpose, so I don't expect really good scalability

geffy avatar Sep 14 '16 09:09 geffy

@geffy @benmccann These days I was learning tensorflow, and developed a distributed factorization machine version. I customized some operators such that it has comparable performance with difacto. Welcome to take a look and give some suggestion :) Thanks.

https://github.com/kopopt/fast_tffm

kopopt avatar Sep 28 '16 21:09 kopopt

I might be able to test; just need to convert crteo into tffm input format. Is there any reference for the input format ?

arita37 avatar Nov 11 '17 13:11 arita37