h2o4gpu [WIP] Field-aware factorization machines

trafficstars

Initial implementation of field-aware factorization machines.

Based on these 2 whitepapers:

https://arxiv.org/pdf/1701.04099.pdf
https://www.csie.ntu.edu.tw/~cjlin/papers/ffm.pdf

And the following repositories:

https://github.com/guestwalk/libffm (original impl)
https://github.com/alexeygrigorev/libffm-python (Python interfact for it)
https://github.com/RTBHOUSE/cuda-ffm (CUDA implementation of a simplified method)

Currently only initial GPU implementation as CPU will most probably just be a copy of the original impl (without the SSE alignments for now).

No benchmarks so far as there's still something wrong (getting different results).

Thing to be still done:

add validation set option and early stopping (FFM seems to need this a lot as it tends to overfit)
add multi GPU support
review the data structures used - using an object oriented approach with Dataset/Row/Node hierarchy is good for development but might provide a lot of overhead when copying data to the device, refactoring this into 3 (or more) continuous arrays might provide a lot of speedup
review the main method wTx (in trainer.cu) - probably can be rewritten in a more GPU friendly manner
probably something else I'm forgetting

If anyone wants to take it for a spin:

>>> from h2o4gpu.solvers.ffm import FFMH2O
>>> import numpy as np
>>> X = [ [(1, 2, 1), (2, 3, 1), (3, 5, 1)],
...      [(1, 0, 1), (2, 3, 1), (3, 7, 1)],
...      [(1, 1, 1), (2, 3, 1), (3, 7, 1), (3, 9, 1)] ]
>>>
>>> y = [1, 1, 0]
>>> ffmh2o = FFMH2O(n_gpus=1)
>>> ffmh2o.fit(X,y)
<h2o4gpu.solvers.ffm.FFMH2O object at 0x7f2d30319fd0>
>>> ffmh2o.predict(X)
array([0.7611223 , 0.6475924 , 0.88890105], dtype=float32)

The input format is a list of lists containing fieldIdx:featureIdx:value tuples and a corresponding list of labels (0 or 1) for each row.

May 11 '18 04:05 mdymczyk

So both CPU and GPU implementations are there and working, the only issue left is that GPU batch mode gives slightly different results with same # of iterations (or converges in a much larger number of iterations) compared to GPU batch mode with batch_size=1 and CPU modes. I'm guessing this is because we are using HOGWILD! and the order of computations during gradient update differs (and might not be 100% correct?).

Jun 26 '18 00:06 mdymczyk

One more thing: this needs to be compared against bigger data (libffm_toy.zip) and the original cpp implementation (https://github.com/guestwalk/libffm - not the Python API). I think the GPU version was getting a bit different results, so needs double checking before merging.

Jun 26 '18 04:06 mdymczyk

h2o4gpu h2o4gpu copied to clipboard

[WIP] Field-aware factorization machines

h2o4gpu
h2o4gpu copied to clipboard