fastFM
fastFM copied to clipboard
Expressing the final weights learnt as a parametric equation
@ibayer After learning the model weights with the parameter names , eg : with X.test.columns as the names, how do I express this as a functional relationship , so it can be productionized ? ie. .. f(x1*-1.29966466e-03+ x2* 1.78455648e+01 ... )
fm.w_ [ -1.29966466e-03 1.78455648e+01 -2.05648306e-01 -2.40578327e+00 4.44556106e+00 9.42411346e-02 1.82644589e+00 2.35087155e+00 -4.14614164e-01 1.52788247e+00 6.72193895e-01 -1.51634745e-01 1.96703805e+00 -7.19508942e-01 -3.00903099e-01 8.13209301e-01]
http://www.jmlr.org/papers/volume17/15-355/15-355.pdf See equation 1.
@ibayer - thanks for your prompt response. So, seeing eqn 1 , I have X encoded as {1,0 along with some continuous variables, see sample below} , which of these will return the y as in eqn 1. I have fm.w0 , fm.w1_ & fm.V_
fm.fit_predict(X_train, y_train,X_test) // class "labels" 1/0
fm.fit_predict_proba(X_train, y_train,X_test) // class probabilities
fm.predict(X_test) // a continuous number, eg : 15.35047575
Additional note: // my fm is initialised as fm = mcmc.FMClassification(n_iter=100, init_stdev=0.1, rank=rank, random_state=seed, copy_X=True) // sample X_train(i) = [969.0, 1.0, 24.3618275, 1.0, 1.0, 0.4, 0.161803, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0]
@ibayer @takuti @macks22 @chezou @bdaskalov - I got the y_pred from fm.predict(X_test)
- which I tested 1:1 with Bayer eqn 1/ Rendle eqn 2 , which is a continuous number [1.0584077976, 0.00105908767392]
- how do I get the class labels & predicted probabilities back ?
My use case is storing the model params (w0_,w_, V_) over n iterations of different test-data sets (my test data is ~ 1.8 TB) and then computing the final y_pred ( probability) at run time, so I can't use fm.fit_predict_proba(X_train, y_train, X_test)
back at runtime ( I should therefore use an equivalent of find_prediction
function below, for computational efficiency at runtime.
Pasting a quick code, in-case someone has a similar question .
import numpy as np
def get_first_order_weights(x_test, w_):
m_x_test = np.matrix(x_test)
m_w_ = np.matrix(w_).transpose()
return float((m_x_test * m_w_))
def get_weight(V_, i,j,rank=2):
"""returns weight of i,j # see also equation #2 , Steffen Rendle(2011, SIGIR) """
weight=0
for rank_ in xrange(rank) :
weight=weight+V_[rank_][i]*V_[rank_][j]
return weight
def get_second_order_weights(x_test, V_):
second_order_w=0
for i in xrange(len(x_test)):
for j in range(i, len(x_test)):
if i !=j :
if x_test[i] !=0 and x_test[j] !=0 :
weight = get_weight(V_, i=i, j=j, rank=2)
second_order_w = second_order_w + weight * x_test[i] * x_test[j]
return second_order_w
def find_prediction(x_test, w0_, w_, V_):
y_pred = w0_ + get_first_order_weights(x_test, w_) + get_second_order_weights(x_test, V_)
print y_pred
"""Quick test """
y_pred_proba=0.666498763558 # what we want. This is the output of fm.fit_predict_proba(X_train, y_train,X_test) for THIS x_test
y_pred=0.551379351199 # what find_prediction(x_test, w0_, w_, V_) gives
hyper_param_=[ 0.72957556, 1.09654054, 4.28621216, 0.80880482, -0.26278131,
0.17551987, -0.17272419]
x_test = [0, 0, 0, 1, 0, 1, 0, 0]
w0_ =0.44717137049755357
rank=2 # Rank : The rank of the factorization used for the second order interactions.
w_ = [ 0.13673361, -0.50175393, -0.43582785, 0.91480033, 0.64150534,
0.85911802, -0.20877941, -0.20461079]
V_ =[[-0.30315417, -0.01520948, 0.35000127, 0.54788385, -0.26731813,
-0.07202204, 0.74163199, 0.25263453],
[ 0.73052313, 0.93649875, 0.55294677, 1.23317741, -0.88026332,
-1.321992 , -0.44626548, 0.27878056]]
get_y_prediction=find_prediction(x_test, w0_, w_, V_,rank=rank)
print get_y_prediction #0.551379337337
get_y_prediction
gives 0.551379337337 which corroborates with the output of y_pred = fm.predict(X_test)
, I need the class probabilities
, which should corroborate with the output of y_pred = fm.fit_predict_proba(X_train, y_train,X_test)
. How do I get this ?
What I tried
-
sigmoid
function1.0/(1+math.exp(-1*float(y_pred)))
#gives0.6344555517817589
which is different from0.666498763558
# what we want. In the paper @ibayer mentions that MCMC classification is modelled with a loss function of Probit(MAP) , Probit, Sigmoid, but I couldn't see an option to specify the loss function (I was hoping I could use the model params back to get the prob. value if this was clear) -
Modelling as a
Probit link
function
2a)
from scipy.stats import norm
x_beta=(np.matrix(x_test)*np.matrix(w_).transpose())
norm.cdf(x_beta) # gives 0.96196167 , Y =Φ(Xβ + ε), cumulative normal CDF
2b)
def phi(x):
return (1.0 + math.erf(x / math.sqrt(2.0))) / 2.0
x_beta=(np.matrix(x_test)*np.matrix(w_).transpose()) # see x_test & w_ above
phi(x_beta) # gives 0.9619616715011631 Y =Φ(Xβ + ε): Cumulative distribution function for the standard normal distribution
-
Mapping as
Logit link
function , which gives0.6394991222434503
import math import numpy as np x_beta=(np.matrix(x_test)*np.matrix(w_).transpose()) # see x_test & w_ above math.pow(1+math.pow(float(x_beta), -1),-1) # Pr(Y=1∣X)=[1+e−X′β]−1
What is the link function, I can use ? Where, fm is initialized as
fm = mcmc.FMClassification(n_iter=100, init_stdev=0.1, rank=rank, random_state=seed, copy_X=True)
fm.fit_predict(X_train, y_train,X_test)
@ibayer - Here are the results of the prob. as returned by fm.fit_predict_proba
(red), sigmoid of y_pred_hat (back calculated, real number, the y_pred_hat itself maps to fm.predict(X_test)) (green) and the corrected probability values when corrected by median of % difference between red & green (in yellow) . Would be glad if you could suggest getting the prob. to stack up ~1:1 as returned by fm.fit_predict_proba
@ibayer - Also from documentation here http://ibayer.github.io/fastFM/tutorial.html#bayesian-probit-classification-with-mcmc-solver
"Probit regression uses the Cumulative Distribution Function (CDF) of the standard normal Distribution as link function. Mainly because the CDF leads to an easier Gibbs solver then the sigmoid function used in the SGD classifier implementation. The results are in practice usually very similar."
, but the results from fm.fit_predict_proba
are actually way more odd when using a probit link function (both std normal & normal )
def find_probit_normal(x, std, mean):
deno=std*math.sqrt(2*math.pi)
num=math.exp(-1*((x-mean)*(x-mean)/1.0*(std*std))/2.0)
return num/deno
std,mean=np.std(y_train), np.mean(y_train)
def find_probit_std_normal(x):
return math.exp(-(1*x*x)/2.0)/(math.sqrt(2*math.pi))
where, I initialize, x as y_pred (real valued). Recall that y_pred maps to reqn 2(pg: 4), Rendle, 2011 / Bayer eqn 1 (2016)
Can you summarize again what you are trying to achieve? I saw you use mcmc somewhere in you code. Please keep in mind
It’s also possible to just call predict on a trained MCMC model but this returns predictions that are solely based on the last parameters draw. These predictions can be used for diagnostic purposes but are usually not as good as averaged predictions returned by fit_predict.
http://ibayer.github.io/fastFM/tutorial.html#bayesian-probit-classification-with-mcmc-solver
@ibayer - Simplifying the question.
-
I have trained the model on some 1200+ files (~1.8 GB) - with 50 files in each run.
-
I store w0,w_ , V_ from each run. I get the
predict
using the code above (which is a real valued number) Question : How do I get the probability values(y_hat) back for each X vector ?I tried using probit, logit, sigmoid as above, but even if these were based on just 1 draw I don't get the values close ( I test this by storing the prob. values and then reverse-engineering the probability values as per the functions above)
Looks to me like you use the mcmc solver. If that's the case then
I store w0,w_ , V_ from each run`
doesn't make sense (I assume that by run you mean one call to fit_predict_proba()
). In this case I recommend to use a different solver.