pywFM icon indicating copy to clipboard operation
pywFM copied to clipboard

Predict new data without training.

Open niknoproblems opened this issue 8 years ago • 14 comments

Can I predict new data by trained model? Or I always should call "run" method?

niknoproblems avatar Apr 13 '16 11:04 niknoproblems

Do you mean using a previous training model (using the save_model flag)?

jfloff avatar Apr 13 '16 15:04 jfloff

exactly

niknoproblems avatar Apr 16 '16 12:04 niknoproblems

Sorry, I have yet to implement that function since me personally had never use for it.

Just to get the feeling how do you envision such an interface? When I started looking into this problem I felt that I would probably need to split pywFM.run into pywFM.train and pywFM.predict, also adding pywFM.load_model that's able to load a train model. Problem is that would probably hurt performance since we would need to run 2 different libfm commands: one with save_model flag another with load_model flag.

Another alternative would be a separate pywFM.train_model and pywFM.run_model that trains and runs a model respectively.

jfloff avatar Apr 16 '16 18:04 jfloff

I think first approach with train and predict methods more standard and clean, like in sklearn. Without that feature many ml techniques like stacking,blending become not so trivial. About performance , yes we should run two libfm commands , but this hurt only for training phase,in predicting stage you need only load model for predict.

niknoproblems avatar Apr 17 '16 12:04 niknoproblems

I'm also leaning towards that approach, since it meets one of my todo points

Improve the save_model / load_model so we can have a more defined init-fit-predict cycle (perhaps we could inherit from sklearn.BaseEstimator)

This weekend I have a little bit of time and I will start to work on this branch (that will break BC, so bumping version). Feel free to also submit changes

jfloff avatar Apr 18 '16 12:04 jfloff

Sorry, I'm not saw your todo . Thank you very much for future work.

niknoproblems avatar Apr 18 '16 15:04 niknoproblems

Hi @jfloff, any advances into that direction? I just realized the issue mentioned by @nickflamel and this simply makes your (very cool) wrapper not usable in a production environment. Btw. I tried the example from Rendle that is also on your README but the prediction is very bad. I guess this is because we don't have much data, but this kinda makes the example unsuitable^^.

felixmaximilian avatar Jun 30 '16 09:06 felixmaximilian

I'm sorry, I haven't had time to dedicate to improving this. I realise that this feature would really improve running several different predictions, and I really want to improve it, but if I'm going to do it, I will inherit from sklearn.BaseEstimator right from the start (which takes a little bit more work).

I have a deadline for Monday. After that I'll dig into this, I promise! :)

The example is just to show how the API works, and what's the flow of libfm :)

jfloff avatar Jun 30 '16 13:06 jfloff

It seems that predict without a new train is not really supported at this moment. It seems that the functionality is not at 100% (e.g. not working for MCMC). I've also taken a look at libFM source code but I haven't had much success. Documentation is also lacking the save_model and load model function.

I'm going on a limb here and ping @thierry-silbermann here since he was responsible for save_model and load_model in libFM. Could you give us some insight on how we should proceed

jfloff avatar Jul 04 '16 18:07 jfloff

Hi, here is how we could proceed to make a predict method:

https://github.com/jilljenn/TF-recomm/blob/master/forward.py#L22

Where the pickled elements are those:

https://github.com/jilljenn/TF-recomm/blob/master/fm_mangaki.py#L39

jilljenn avatar Feb 19 '18 11:02 jilljenn

Want to try submit a PR for this?

jfloff avatar Feb 19 '18 12:02 jfloff

Yes. It will look like this.

https://github.com/mangaki/mangaki/pull/549/files#diff-2b98b5dc82ffbac20dd8c88ce88d6b5cR65

I don't know why I had sometimes to use .A1 (conversion matrix to ndarray), sometimes not.

Can you consider casting model.weights and model.pairwise_interactions to NumPy arrays?

jilljenn avatar Feb 19 '18 13:02 jilljenn

I don't see any problem with that.

jfloff avatar Feb 19 '18 15:02 jfloff

5 years later, I finally made a scikit-learn estimator: https://github.com/jilljenn/ktm/blob/master/fm.py#L25

It will be improved over the next few days, then I can copy it in your repo.

jilljenn avatar Dec 27 '23 12:12 jilljenn