fastFM Expressing the final weights learnt as a parametric equation

@ibayer After learning the model weights with the parameter names , eg : with X.test.columns as the names, how do I express this as a functional relationship , so it can be productionized ? ie. .. f(x1*-1.29966466e-03+ x2* 1.78455648e+01 ... )

fm.w_ [ -1.29966466e-03 1.78455648e+01 -2.05648306e-01 -2.40578327e+00 4.44556106e+00 9.42411346e-02 1.82644589e+00 2.35087155e+00 -4.14614164e-01 1.52788247e+00 6.72193895e-01 -1.51634745e-01 1.96703805e+00 -7.19508942e-01 -3.00903099e-01 8.13209301e-01]

Jul 14 '17 07:07 ekta1007

http://www.jmlr.org/papers/volume17/15-355/15-355.pdf See equation 1.

Jul 14 '17 08:07 ibayer

@ibayer - thanks for your prompt response. So, seeing eqn 1 , I have X encoded as {1,0 along with some continuous variables, see sample below} , which of these will return the y as in eqn 1. I have fm.w0 , fm.w1_ & fm.V_

fm.fit_predict(X_train, y_train,X_test) // class "labels" 1/0

fm.fit_predict_proba(X_train, y_train,X_test) // class probabilities

fm.predict(X_test) // a continuous number, eg : 15.35047575

Additional note: // my fm is initialised as fm = mcmc.FMClassification(n_iter=100, init_stdev=0.1, rank=rank, random_state=seed, copy_X=True) // sample X_train(i) = [969.0, 1.0, 24.3618275, 1.0, 1.0, 0.4, 0.161803, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0]

Jul 14 '17 09:07 ekta1007

@ibayer @takuti @macks22 @chezou @bdaskalov - I got the y_pred from fm.predict(X_test) - which I tested 1:1 with Bayer eqn 1/ Rendle eqn 2 , which is a continuous number [1.0584077976, 0.00105908767392] - how do I get the class labels & predicted probabilities back ?

My use case is storing the model params (w0_,w_, V_) over n iterations of different test-data sets (my test data is ~ 1.8 TB) and then computing the final y_pred ( probability) at run time, so I can't use fm.fit_predict_proba(X_train, y_train, X_test) back at runtime ( I should therefore use an equivalent of find_prediction function below, for computational efficiency at runtime.

Pasting a quick code, in-case someone has a similar question .

import numpy as np

def get_first_order_weights(x_test, w_):
    m_x_test = np.matrix(x_test)
    m_w_ = np.matrix(w_).transpose()
    return float((m_x_test * m_w_))

def get_weight(V_, i,j,rank=2):
    """returns weight of i,j # see also equation #2 , Steffen Rendle(2011, SIGIR) """ 
    weight=0
    for rank_ in xrange(rank) :
        weight=weight+V_[rank_][i]*V_[rank_][j]
    return weight

def get_second_order_weights(x_test, V_):
    second_order_w=0
    for i in xrange(len(x_test)):
        for j in range(i, len(x_test)):
            if i !=j :
                if x_test[i] !=0 and  x_test[j] !=0  :
                    weight = get_weight(V_, i=i, j=j, rank=2)
                    second_order_w =  second_order_w + weight * x_test[i] * x_test[j]
    return  second_order_w


def find_prediction(x_test, w0_, w_, V_):
    y_pred = w0_ + get_first_order_weights(x_test, w_) + get_second_order_weights(x_test, V_)
    print y_pred


"""Quick test """
    y_pred_proba=0.666498763558 # what we want. This is the output of fm.fit_predict_proba(X_train, y_train,X_test) for THIS x_test
    y_pred=0.551379351199 # what find_prediction(x_test, w0_, w_, V_) gives
    hyper_param_=[ 0.72957556,  1.09654054,  4.28621216,  0.80880482, -0.26278131,
            0.17551987, -0.17272419]
    x_test = [0, 0, 0, 1, 0, 1, 0, 0]
    w0_ =0.44717137049755357
    rank=2 # Rank : The rank of the factorization used for the second order interactions.
    w_ = [ 0.13673361, -0.50175393, -0.43582785,  0.91480033,  0.64150534,
             0.85911802, -0.20877941, -0.20461079]
    V_ =[[-0.30315417, -0.01520948,  0.35000127,  0.54788385, -0.26731813,
             -0.07202204,  0.74163199,  0.25263453],
            [ 0.73052313,  0.93649875,  0.55294677,  1.23317741, -0.88026332,
             -1.321992  , -0.44626548,  0.27878056]]
    
    get_y_prediction=find_prediction(x_test, w0_, w_, V_,rank=rank)
    print get_y_prediction #0.551379337337

get_y_prediction gives 0.551379337337 which corroborates with the output of y_pred = fm.predict(X_test), I need the class probabilities , which should corroborate with the output of y_pred = fm.fit_predict_proba(X_train, y_train,X_test) . How do I get this ?

What I tried

sigmoid function 1.0/(1+math.exp(-1*float(y_pred))) #gives 0.6344555517817589 which is different from 0.666498763558 # what we want. In the paper @ibayer mentions that MCMC classification is modelled with a loss function of Probit(MAP) , Probit, Sigmoid, but I couldn't see an option to specify the loss function (I was hoping I could use the model params back to get the prob. value if this was clear)
Modelling as a Probit link function

2a)

    from scipy.stats import norm
    x_beta=(np.matrix(x_test)*np.matrix(w_).transpose()) 
    norm.cdf(x_beta)  # gives 0.96196167 , Y =Φ(Xβ + ε), cumulative normal CDF

2b)

def phi(x):
        return (1.0 + math.erf(x / math.sqrt(2.0))) / 2.0
    x_beta=(np.matrix(x_test)*np.matrix(w_).transpose()) # see x_test & w_ above
    phi(x_beta) # gives 0.9619616715011631 Y =Φ(Xβ + ε): Cumulative distribution function for the standard normal distribution

Mapping as Logit link function , which gives 0.6394991222434503

 import math
 import numpy as np
 x_beta=(np.matrix(x_test)*np.matrix(w_).transpose()) # see x_test & w_ above
 math.pow(1+math.pow(float(x_beta), -1),-1) # Pr(Y=1∣X)=[1+e−X′β]−1

What is the link function, I can use ? Where, fm is initialized as

fm = mcmc.FMClassification(n_iter=100, init_stdev=0.1, rank=rank, random_state=seed, copy_X=True)
fm.fit_predict(X_train, y_train,X_test)

Jul 26 '17 07:07 ekta1007

@ibayer - Here are the results of the prob. as returned by fm.fit_predict_proba (red), sigmoid of y_pred_hat (back calculated, real number, the y_pred_hat itself maps to fm.predict(X_test)) (green) and the corrected probability values when corrected by median of % difference between red & green (in yellow) . Would be glad if you could suggest getting the prob. to stack up ~1:1 as returned by fm.fit_predict_proba

screen shot 2017-07-27 at 12 26 35

screen shot 2017-07-27 at 12 26 21

Jul 27 '17 06:07 ekta1007

@ibayer - Also from documentation here http://ibayer.github.io/fastFM/tutorial.html#bayesian-probit-classification-with-mcmc-solver

"Probit regression uses the Cumulative Distribution Function (CDF) of the standard normal Distribution as link function. Mainly because the CDF leads to an easier Gibbs solver then the sigmoid function used in the SGD classifier implementation. The results are in practice usually very similar."

, but the results from fm.fit_predict_proba are actually way more odd when using a probit link function (both std normal & normal )

screen shot 2017-07-27 at 14 29 38

screen shot 2017-07-27 at 14 29 53

def find_probit_normal(x, std, mean):
    deno=std*math.sqrt(2*math.pi)
    num=math.exp(-1*((x-mean)*(x-mean)/1.0*(std*std))/2.0)
    return num/deno
std,mean=np.std(y_train), np.mean(y_train)

def find_probit_std_normal(x):
	return math.exp(-(1*x*x)/2.0)/(math.sqrt(2*math.pi))

where, I initialize,  x as y_pred (real valued).  Recall that y_pred maps to reqn 2(pg: 4), Rendle, 2011 / Bayer eqn 1 (2016)

Jul 27 '17 09:07 ekta1007

Can you summarize again what you are trying to achieve? I saw you use mcmc somewhere in you code. Please keep in mind

It’s also possible to just call predict on a trained MCMC model but this returns predictions that are solely based on the last parameters draw. These predictions can be used for diagnostic purposes but are usually not as good as averaged predictions returned by fit_predict.

http://ibayer.github.io/fastFM/tutorial.html#bayesian-probit-classification-with-mcmc-solver

Jul 27 '17 18:07 ibayer

@ibayer - Simplifying the question.

I have trained the model on some 1200+ files (~1.8 GB) - with 50 files in each run.
I store w0,w_ , V_ from each run. I get the predict using the code above (which is a real valued number) Question : How do I get the probability values(y_hat) back for each X vector ?

I tried using probit, logit, sigmoid as above, but even if these were based on just 1 draw I don't get the values close ( I test this by storing the prob. values and then reverse-engineering the probability values as per the functions above)

Aug 04 '17 11:08 ekta1007

Looks to me like you use the mcmc solver. If that's the case then

I store w0,w_ , V_ from each run`

doesn't make sense (I assume that by run you mean one call to fit_predict_proba()). In this case I recommend to use a different solver.

Aug 07 '17 20:08 ibayer

fastFM fastFM copied to clipboard

Expressing the final weights learnt as a parametric equation

fastFM
fastFM copied to clipboard