imodels
imodels copied to clipboard
lin_trim_quantile not updated when using set_params on RuleFitClassifier
Hi,
I noticed unexpected behavior when setting the lin_trim_quantile parameter in RuleFitClassifier using set_params(). Specifically, the parameter is correctly applied when passed during initialization, but calling set_params afterward does not update the value — it keeps the default.
Example: setting during initialization
>>> params={
... 'random_state': 42,
... 'n_estimators': 150,
... 'lin_trim_quantile': 0.04
... }
>>> model = RuleFitClassifier(
... **params
... )
>>> model.fit(X, y)
RuleFitClassifier(lin_trim_quantile=0.04, n_estimators=150, random_state=42)
>>> preds = model.predict(X_test)
>>> print(model.get_params(deep=True))
{'alpha': None, 'cv': True, 'exp_rand_tree_size': True, 'include_linear': True, 'lin_standardise': True, 'lin_trim_quantile': 0.04, 'max_rules': 30, 'memory_par': 0.01, 'n_estimators': 150, 'random_state': 42, 'sample_fract': 'default', 'tree_generator': None, 'tree_size': 4}
>>> print(model.friedscale.winsorizer.trim_quantile)
0.04
>>> print(classification_report(
... y_true=y_test,
... y_pred=preds))
precision recall f1-score support
0 0.60 0.90 0.72 100
1 0.80 0.41 0.54 100
accuracy 0.66 200
macro avg 0.70 0.66 0.63 200
weighted avg 0.70 0.66 0.63 200
Example: setting after initialization using set_params
>>> model = RuleFitClassifier()
>>> model.set_params(
... **params
... )
RuleFitClassifier(lin_trim_quantile=0.04, n_estimators=150, random_state=42)
>>> model.fit(X, y)
RuleFitClassifier(lin_trim_quantile=0.04, n_estimators=150, random_state=42)
>>> preds = model.predict(X_test)
>>> print(model.get_params(deep=True))
{'alpha': None, 'cv': True, 'exp_rand_tree_size': True, 'include_linear': True, 'lin_standardise': True, 'lin_trim_quantile': 0.04, 'max_rules': 30, 'memory_par': 0.01, 'n_estimators': 150, 'random_state': 42, 'sample_fract': 'default', 'tree_generator': None, 'tree_size': 4}
>>> print(model.friedscale.winsorizer.trim_quantile)
0.025
>>> print(classification_report(
... y_true=y_test,
... y_pred=preds))
precision recall f1-score support
0 0.63 0.90 0.74 100
1 0.82 0.47 0.60 100
accuracy 0.69 200
macro avg 0.73 0.69 0.67 200
weighted avg 0.73 0.69 0.67 200
Both results should be identical, but they are not.
Root cause
The issue occurs because RuleFitClassifier initializes its internal objects in the constructor:
https://github.com/csinva/imodels/blob/81764775453862e1f12d1f66441c99df8f81e67e/imodels/rule_set/rule_fit.py#L99-L100
- When you set
lin_trim_quantilevia set_params, the internalwinsorizerandfriedscaleare already created with the defaultlin_trim_quantile. - Updating the parameter afterwards does not update the existing
winsorizerorfriedscale, so the change has no effect during fit.
Reproducible example:
import numpy as np
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from imodels import RuleFitClassifier
# Parameters
n_samples = 5000
n_features = 5
np.random.seed(42)
# Features for class 0: Normal distribution centered at 0
X0 = np.random.normal(loc=0, scale=1, size=(n_samples//2, n_features))
# Features for class 1: Normal distribution centered at .5
X1 = np.random.normal(loc=0.5, scale=1, size=(n_samples//2, n_features))
# Combine features and labels
X = np.vstack([X0, X1])
y = np.array([0]*(n_samples//2) + [1]*(n_samples//2))
# Shuffle the dataset
indices = np.arange(n_samples)
np.random.shuffle(indices)
X = X[indices]
y = y[indices]
# Train-test split
X, X_test, y, y_test = train_test_split(
X, y, test_size=200, random_state=42, stratify=y
)
print("features:", X)
print("targets:", y)
# print(df.head())
params={
'random_state': 42,
'n_estimators': 150,
'lin_trim_quantile': 0.04
}
model = RuleFitClassifier(
**params
)
model.fit(X, y)
preds = model.predict(X_test)
print(model.get_params(deep=True))
print(model.friedscale.winsorizer.trim_quantile)
print(classification_report(
y_true=y_test,
y_pred=preds))
model = RuleFitClassifier()
model.set_params(
**params
)
model.fit(X, y)
preds = model.predict(X_test)
print(model.get_params(deep=True))
print(model.friedscale.winsorizer.trim_quantile)
print(classification_report(
y_true=y_test,
y_pred=preds))
Thanks!