py-earth
py-earth copied to clipboard
Unexplained behaviour of Stopping condition 0: Reached maximum number of terms
Hello, colleagues,
I have the following problem: using PyEarth for classification task on dataset with 300000 rows and more than 500 features, I set max_terms to sufficiently high number (i.e. 100). But after two iterations everything stopped and Stopping condition 0: Reached maximum number of terms appears.
import numpy from pyearth import Earth from sklearn.linear_model import ElasticNet from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split
model = Pipeline([('earth',Earth(max_degree=4,max_terms=100, verbose=True, enable_pruning=False)), ('enet',ElasticNet(l1_ratio=0.0,alpha=1.0))])
X_t = StandardScaler().fit_transform(X_t) model.fit(X_t, Y_t*100)
Beginning forward pass
iter parent var knot mse terms gcv rsq grsq
0 - - - 34.148441 1 34.149 0.000 0.000
1 0 180 114453 34.135289 3 34.137 0.000 0.000
Stopping Condition 0: Reached maximum number of terms
May be I am just doing something wrong or whatever? From metrics I got I can see that model is pretty robust, but underfitted.
Nikita
@nikrepp I don't see any obvious problems with what you're doing. That seems like a pretty severe issue, though, so I'm surprised to be seeing it for the first time now. Here are a few questions that might help me:
- Is the code you included above the complete program that produces the error?
- Does the issue seem to depend on your data set, or does it happen with any data you use?
- Can you tell me what your operating system, python version, numpy, scipy, and scikit-learn versions are?
- How did you install py-earth, and what is
pyearth.__version__?
Hello Jason,
see answers for your questions.
- Complete program is here. Target is very low (0.0035).
import pandas as pd import numpy as np ##Read target dataset = pd.read_csv('....csv', sep=',', encoding='cp1251') dataset = dataset.head(10000)
y = dataset[u'Флаг рефинансирования'] X = dataset.drop(dataset.columns[[0,1,2,3,6]], axis=1)
import pyearth import scipy import sklearn import numpy print(pyearth.version) print(numpy.version) print(scipy.version) print(sklearn.version)
import numpy from pyearth.earth import Earth from sklearn.linear_model import ElasticNet from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split
model = Pipeline([('earth',Earth(max_degree=4,max_terms=10, minspan_alpha=10, verbose=True, enable_pruning=False)), ('enet',ElasticNet(l1_ratio=0.0,alpha=1.0))])
X = StandardScaler().fit_transform(X) model.fit(X, y)
Beginning forward pass
iter parent var knot mse terms gcv rsq grsq
0 - - - 0.002394 1 0.002 0.000 0.000
1 0 304 5228 0.002354 3 0.002 0.017 0.016
2 1 344 7108 0.002295 5 0.002 0.042 0.040
3 2 160 3478 0.002273 7 0.002 0.051 0.048
4 5 573 1411 0.002195 9 0.002 0.083 0.080
5 6 450 4536 0.002195 11 0.002 0.083 0.079
Stopping Condition 0: Reached maximum number of terms
C:\Users\I304909\AppData\Local\Continuum\Miniconda2\envs\tensorflow\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems. ConvergenceWarning)
Out[4]:
Pipeline(memory=None, steps=[('earth', Earth(allow_linear=None, allow_missing=False, check_every=None, enable_pruning=False, endspan=None, endspan_alpha=None, fast_K=None, fast_h=None, feature_importance_type=None, max_degree=4, max_terms=10, min_search_points=None, minspan=None, minspan_alpha=10, penalty=None, ...alse, precompute=False, random_state=None, selection='cyclic', tol=0.0001, warm_start=False))])
- I've tested it on Adult dataset from UCI, it works!
import pandas as pd import numpy as np ##Read target dataset = pd.read_csv('C:/.../Census01.csv', sep=';', encoding='utf8') dataset = dataset
for i in dataset.columns: dataset[i] = dataset[i].factorize()[0].astype(np.int32)
y=dataset['age'] X = dataset.drop(dataset.columns[[0]], axis=1) model2 = Pipeline([('earth',Earth(max_degree=4,max_terms=10, verbose=True, enable_pruning=False)), ('enet',ElasticNet(l1_ratio=0.0,alpha=1.0))])
X = StandardScaler().fit_transform(dataset) model2.fit(X, y)
Beginning forward pass
iter parent var knot mse terms gcv rsq grsq
0 - - - 241.716883 1 241.727 0.000 0.000
1 0 4 -1 238.942474 2 238.977 0.011 0.011
2 1 4 -1 235.893861 3 235.952 0.024 0.024
3 0 1 -1 234.005053 4 234.087 0.032 0.032
4 1 6 -1 232.915885 5 233.021 0.036 0.036
5 0 11 -1 231.898621 6 232.027 0.041 0.040
6 0 9 19353 231.112850 8 231.288 0.044 0.043
7 0 0 -1 230.395323 9 230.594 0.047 0.046
8 8 5 -1 229.583339 10 229.804 0.050 0.049
9 0 2 -1 229.275825 11 229.520 0.051 0.050
Stopping Condition 0: Reached maximum number of terms
-
Windows 10, python: I've tested 2.7 and 3 (the same behavior). PyEarth, Numpy, Scipy, Sklearn: 0.1.0 1.13.3 1.0.0 0.19.1
-
I tried different ways, last way through Conda, first - building from source (the same behavior).
Thanks! I also very interested what is that.
@nikrepp Thanks for all the info. In the code you pasted above, you set max_terms to 10, and the forward pass terminated after 5 iterations. That is expected behavior as each iteration produces 2 terms (assuming it finds a knot that is superior to the linear term). Is that the problem you are observing, or is there other worse behavior you're seeing? The reason it goes to iteration 9 on the UCI data set is that it is picking linear basis functions (knot = -1), which only add one term each.
Hello Jason,
fortunately, I can not reproduce weird behaviour anymore, so I prefer thinking it was corrupted install from sources under Python2 on Windows.
Thank you for all the details. I am looking forward for development of this framework for classification problems objectives, better support for categorical predictors and interpretation of fitted relationships.
Thanks!
P.S. You can give me a pleasure with a possibility to contribute in one of this topics.
Regards, Nikita