lin_trim_quantile not updated when using set_params on RuleFitClassifier

Open man-c opened this issue 1 month ago • 0 comments

Hi,

I noticed unexpected behavior when setting the lin_trim_quantile parameter in RuleFitClassifier using set_params(). Specifically, the parameter is correctly applied when passed during initialization, but calling set_params afterward does not update the value — it keeps the default.

Example: setting during initialization

>>> params={
...     'random_state': 42,
...     'n_estimators': 150,
...     'lin_trim_quantile': 0.04
... }
>>> model = RuleFitClassifier(
...     **params
... )
>>> model.fit(X, y)
RuleFitClassifier(lin_trim_quantile=0.04, n_estimators=150, random_state=42)
>>> preds = model.predict(X_test)
>>> print(model.get_params(deep=True))
{'alpha': None, 'cv': True, 'exp_rand_tree_size': True, 'include_linear': True, 'lin_standardise': True, 'lin_trim_quantile': 0.04, 'max_rules': 30, 'memory_par': 0.01, 'n_estimators': 150, 'random_state': 42, 'sample_fract': 'default', 'tree_generator': None, 'tree_size': 4}
>>> print(model.friedscale.winsorizer.trim_quantile)
0.04
>>> print(classification_report(
...     y_true=y_test,
...     y_pred=preds))
              precision    recall  f1-score   support

           0       0.60      0.90      0.72       100
           1       0.80      0.41      0.54       100

    accuracy                           0.66       200
   macro avg       0.70      0.66      0.63       200
weighted avg       0.70      0.66      0.63       200

Example: setting after initialization using set_params

>>> model = RuleFitClassifier()
>>> model.set_params(
...     **params
... )
RuleFitClassifier(lin_trim_quantile=0.04, n_estimators=150, random_state=42)
>>> model.fit(X, y)
RuleFitClassifier(lin_trim_quantile=0.04, n_estimators=150, random_state=42)
>>> preds = model.predict(X_test)
>>> print(model.get_params(deep=True))
{'alpha': None, 'cv': True, 'exp_rand_tree_size': True, 'include_linear': True, 'lin_standardise': True, 'lin_trim_quantile': 0.04, 'max_rules': 30, 'memory_par': 0.01, 'n_estimators': 150, 'random_state': 42, 'sample_fract': 'default', 'tree_generator': None, 'tree_size': 4}
>>> print(model.friedscale.winsorizer.trim_quantile)
0.025
>>> print(classification_report(
...     y_true=y_test,
...     y_pred=preds))
              precision    recall  f1-score   support

           0       0.63      0.90      0.74       100
           1       0.82      0.47      0.60       100

    accuracy                           0.69       200
   macro avg       0.73      0.69      0.67       200
weighted avg       0.73      0.69      0.67       200

Both results should be identical, but they are not.

Root cause

The issue occurs because RuleFitClassifier initializes its internal objects in the constructor: https://github.com/csinva/imodels/blob/81764775453862e1f12d1f66441c99df8f81e67e/imodels/rule_set/rule_fit.py#L99-L100

When you set lin_trim_quantile via set_params, the internal winsorizer and friedscale are already created with the default lin_trim_quantile.
Updating the parameter afterwards does not update the existing winsorizer or friedscale, so the change has no effect during fit.

Reproducible example:

import numpy as np
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

from imodels import RuleFitClassifier

# Parameters
n_samples = 5000
n_features = 5
np.random.seed(42)

# Features for class 0: Normal distribution centered at 0
X0 = np.random.normal(loc=0, scale=1, size=(n_samples//2, n_features))
# Features for class 1: Normal distribution centered at .5
X1 = np.random.normal(loc=0.5, scale=1, size=(n_samples//2, n_features))

# Combine features and labels
X = np.vstack([X0, X1])
y = np.array([0]*(n_samples//2) + [1]*(n_samples//2))

# Shuffle the dataset
indices = np.arange(n_samples)
np.random.shuffle(indices)
X = X[indices]
y = y[indices]

# Train-test split
X, X_test, y, y_test = train_test_split(
    X, y, test_size=200, random_state=42, stratify=y
)
print("features:", X)
print("targets:", y)
# print(df.head())

params={
    'random_state': 42,
    'n_estimators': 150,
    'lin_trim_quantile': 0.04
}
model = RuleFitClassifier(
    **params
)
model.fit(X, y)
preds = model.predict(X_test)
print(model.get_params(deep=True))
print(model.friedscale.winsorizer.trim_quantile)
print(classification_report(
    y_true=y_test, 
    y_pred=preds))


model = RuleFitClassifier()
model.set_params(
    **params
)
model.fit(X, y)
preds = model.predict(X_test)
print(model.get_params(deep=True))
print(model.friedscale.winsorizer.trim_quantile)
print(classification_report(
    y_true=y_test, 
    y_pred=preds))

Thanks!

Nov 20 '25 11:11 man-c