cross validation and grid search
I would like to use FM_FTRL
in an sklearn cross-validation pipeline, e.g.,
from wordbatch.models import FM_FTRL
modelF = FM_FTRL(
alpha=0.01, # learning rate
cv_scores = cross_val_score(modelF, X_train.tocsc(), y_train_fm.target.values, scoring='roc_auc', cv=time_split)
This throws
TypeError: Cannot clone object '<wordbatch.models.fm_ftrl.FM_FTRL object at 0x557056cfbfa0>' (type <class 'wordbatch.models.fm_ftrl.FM_FTRL'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.
This error is also thrown when trying to pass a FM_FTRL model to GridSearchCV
Can you provide some guidance on how to make this work?
I can see in this thread that you tuned hyperparameters with random search. Can you provide guidance on that?
Thank you!
I'll have a look at fixing this. Basically you'd just need to add these to the model .pyx files:
def get_params(self): return self.__getstate__()
def set_params(self, params): self.__setstate__(params):
This could work. I fixed most of the incompatibilities with sklearn earlier, but it seems I missed this.
On model hyperparameter estimation, I've used a Gaussian random search algorithm described in Section 6.1.4 of my thesis: https://arxiv.org/pdf/1602.02332.pdf . You can use any of the Python hyperparameter packages, such as cmaes, hyperopt, skopt, Ray Tune, etc. There's a lot of them by now. I'll try to add hyperpameter optimization into the package when I get the chance, since this is useful with the supported models, and makes good use of the supported distributed computing backends.
This is bit more complicated to fix. It seem sklearn would need a fix.
For ftrl.pyx, get_params can be defined:
def get_params(self, deep=False):
param_names= ["alpha", "beta", "L1", "L2", "e_clip", "D", "init", "seed", "iters", "w", "z", "n", "inv_link",
"threads", "bias_term", "verbose"]
params= {x:y for x, y in zip(param_names, self.__getstate__())}
if params['inv_link']==1: params['inv_link']= "sigmoid"
else: params['inv_link']= "identity"
return params
Also, the estimator _init_ function needs adding the model parameters w, z, n as optional arguments (np.ndarray w= None, np.ndarray z= None, np.ndarray n= None)
After this you'll still get the following validation error from sklearn:
RuntimeError: Cannot clone object <wordbatch.models.ftrl.FTRL object at 0x56245d510ee0>, as the constructor either does not set or modifies parameter alpha
You can debug sklearn base.py clone() to print the variables:
print(name, param1, param2, type(param1), type(param2), param1 is param2, param1==param2)
Which prints out this:
alpha 0.1 0.1 <class 'float'> <class 'float'> False True
So the first "param1 is param2" comparison fails, whereas the "param1 == param2" comparison works. This is due to difference how the Python "is" and "==" comparisons work. Here any float or numpy array will fail the comparison, so the validation raises an error. I'm not sure why they use "is" instead of "==" in the clone() validation, since this validation should be comparing values of different objects, not their references.
I'll set up a ticket for sklearn developers for fixing the above. If that gets fixed then there's not that many changes to do to fix this issue.
Thank you for looking into this!