auto_ml icon indicating copy to clipboard operation
auto_ml copied to clipboard

Using different Regression models is not working properly

Open Ogofo opened this issue 6 years ago • 7 comments

Hey, when I'm running the default pipeline of auto_ml I get errors along the way. It seems that only a certain subset of models can be executed in the same pipeline run. For example model_names = ['SGDRegressor', 'LGBMRegressor', 'XGBRegressor'] throws an error while model_names = ['LGBMRegressor', 'XGBRegressor'] and model_names = ['SGDRegressor'] work fine.

Additionally the compare_all_models=True parameter combines models that are not working together and thus throws an error as well.

This is the script I'm running:

from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset

df_train, df_test = get_boston_dataset()

column_descriptions = {
    'MEDV': 'output'
    , 'CHAS': 'categorical'
}

ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)

ml_predictor.train(df_train, compare_all_models=True)

ml_predictor.score(df_test, df_test.MEDV)

And this is the error I get. The error seems to be the same for every "not working" combination of models.

Welcome to auto_ml! We're about to go through and make sense of your data using machine learning, and give you a production-ready pipeline to get predictions with.

If you have any issues, or new feature ideas, let us know at http://auto.ml
You are running on version 2.9.10
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'learning_rate': 0.1, 'warm_start': True}
Running basic data cleaning
Performing feature scaling
Fitting DataFrameVectorizer
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'learning_rate': 0.1, 'warm_start': True}


********************************************************************************************
About to run GridSearchCV on the pipeline for several models to predict MEDV
Fitting 2 folds for each of 6 candidates, totalling 12 fits
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-46-da4fdd716f78> in <module>()
     11 ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)
     12 
---> 13 ml_predictor.train(df_train, compare_all_models=True)
     14 
     15 ml_predictor.score(df_test, df_test.MEDV)

~\Anaconda3\envs\IMI-Devel\lib\site-packages\auto_ml\predictor.py in train(***failed resolving arguments***)
    668 
    669         # This is our main logic for how we train the final model
--> 670         self.trained_final_model = self.train_ml_estimator(self.model_names, self._scorer, X_df, y)
    671 
    672         if self.ensemble_config is not None and len(self.ensemble_config) > 0:

~\Anaconda3\envs\IMI-Devel\lib\site-packages\auto_ml\predictor.py in train_ml_estimator(self, estimator_names, scoring, X_df, y, feature_learning, prediction_interval)
   1247             self.grid_search_params = grid_search_params
   1248 
-> 1249             gscv_results = self.fit_grid_search(X_df, y, grid_search_params, refit=True)
   1250 
   1251             trained_final_model = gscv_results.best_estimator_

~\Anaconda3\envs\IMI-Devel\lib\site-packages\auto_ml\predictor.py in fit_grid_search(self, X_df, y, gs_params, feature_learning, refit)
   1192                 # Note that we will only report analytics results on the final model that ultimately gets selected, and trained on the entire dataset
   1193 
-> 1194         gs.fit(X_df, y)
   1195 
   1196         if self.verbose:

~\Anaconda3\envs\IMI-Devel\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
    637                                   error_score=self.error_score)
    638           for parameters, (train, test) in product(candidate_params,
--> 639                                                    cv.split(X, y, groups)))
    640 
    641         # if one choose to see train score, "out" will contain train score info

~\Anaconda3\envs\IMI-Devel\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self, iterable)
    787                 # consumption.
    788                 self._iterating = False
--> 789             self.retrieve()
    790             # Make sure that we get a last message telling us we are done
    791             elapsed_time = time.time() - self._start_time

~\Anaconda3\envs\IMI-Devel\lib\site-packages\sklearn\externals\joblib\parallel.py in retrieve(self)
    697             try:
    698                 if getattr(self._backend, 'supports_timeout', False):
--> 699                     self._output.extend(job.get(timeout=self.timeout))
    700                 else:
    701                     self._output.extend(job.get())

~\Anaconda3\envs\IMI-Devel\lib\multiprocessing\pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

~\Anaconda3\envs\IMI-Devel\lib\multiprocessing\pool.py in _handle_tasks(taskqueue, put, outqueue, pool, cache)
    422                         break
    423                     try:
--> 424                         put(task)
    425                     except Exception as e:
    426                         job, idx = task[:2]

~\Anaconda3\envs\IMI-Devel\lib\site-packages\sklearn\externals\joblib\pool.py in send(obj)
    369             def send(obj):
    370                 buffer = BytesIO()
--> 371                 CustomizablePickler(buffer, self._reducers).dump(obj)
    372                 self._writer.send_bytes(buffer.getvalue())
    373             self._send = send

~\Anaconda3\envs\IMI-Devel\lib\site-packages\auto_ml\predictor.py in _pickle_method(m)
     47 # For handling parallelism edge cases
     48 def _pickle_method(m):
---> 49     if m.im_self is None:
     50         return getattr, (m.im_class, m.im_func.func_name)
     51     else:

AttributeError: 'function' object has no attribute 'im_self'

Ogofo avatar Apr 30 '18 10:04 Ogofo

I am having the same issues as Ogofo. Running the same problem, but with: ml_predictor.train(df_boston_train,model_names=['ElasticNet','LinearSVR'])

I get the following error message:

File "C:\Users\Anaconda3\lib\site-packages\auto_ml\predictor.py", line 670, in train self.trained_final_model = self.train_ml_estimator(self.model_names, self._scorer, X_df, y)

File "C:\Users\Anaconda3\lib\site-packages\auto_ml\predictor.py", line 1249, in train_ml_estimator gscv_results = self.fit_grid_search(X_df, y, grid_search_params, refit=True)

File "C:\Users\Anaconda3\lib\site-packages\auto_ml\predictor.py", line 1194, in fit_grid_search gs.fit(X_df, y)

File "C:\Users\Anaconda3\lib\site-packages\sklearn\model_selection_search.py", line 639, in fit cv.split(X, y, groups)))

File "C:\Users\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 789, in call self.retrieve()

File "C:\Users\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 699, in retrieve self._output.extend(job.get(timeout=self.timeout))

File "C:\Users\Anaconda3\lib\multiprocessing\pool.py", line 608, in get raise self._value

File "C:\Users\Anaconda3\lib\multiprocessing\pool.py", line 385, in _handle_tasks put(task)

File "C:\Users\Anaconda3\lib\site-packages\sklearn\externals\joblib\pool.py", line 371, in send CustomizablePickler(buffer, self._reducers).dump(obj)

File "C:\Users\Anaconda3\lib\site-packages\auto_ml\predictor.py", line 49, in _pickle_method if m.im_self is None:

AttributeError: 'function' object has no attribute 'im_self'

avdusen avatar May 24 '18 17:05 avdusen

Hi,

I am also having the same issue. If I used multiple models in model_names like LinearRegression and DeeplearningRegressor it throws me error as below

AttributeError: 'function' object has no attribute 'im_self.

Suganth10 avatar May 27 '18 14:05 Suganth10

I barely have any grasp on how to use github and don't know how to do a pull request, but the issue is that the file predictor.py contains a function called _pickle_method which uses Python 2 function attributes. The issue is that all of us are trying to run Python 3.

If you go to auto_ml\predictor.py and edit the function to this, it should work.

def _pickle_method(m):
        if m.__self__ is None:
            return getattr, (m.__self__.__class__, m.__func__.__name__)
        else:
            return getattr, (m.__self__, m.__func__.__name__)

At least, I think that should fix it. No promises.

above-c-level avatar Jun 04 '18 20:06 above-c-level

Tried the solution mentioned by @above-c-level but it didn't work for me :( . Any suggestions folks ?

Tagging @ClimbsRocks for a request to look into this.

abhishekvij avatar Aug 01 '18 17:08 abhishekvij

Hi all. I also have this problem. Here is gist with my way to reproduce it: https://gist.github.com/QuantumDamage/d80a49fc7cf963ce214885057ac70448/51ae358b888577d8f5ef21cef215360f09f17e86

QuantumDamage avatar Oct 23 '18 16:10 QuantumDamage

I barely have any grasp on how to use github and don't know how to do a pull request, but the issue is that the file predictor.py contains a function called _pickle_method which uses Python 2 function attributes. The issue is that all of us are trying to run Python 3.

If you go to auto_ml\predictor.py and edit the function to this, it should work.

def _pickle_method(m):
        if m.__self__ is None:
            return getattr, (m.__self__.__class__, m.__func__.__name__)
        else:
            return getattr, (m.__self__, m.__func__.__name__)

At least, I think that should fix it. No promises.

I use with python3, and this is fixed the problem!

asvany avatar Nov 03 '18 14:11 asvany

Hey there! It appears that this repository is no longer being actively maintained. However, I still think it's a great idea, so I've started working on my own version of it, which you can find here. It's worth pointing out that I've dropped Python 2.7 support, so you'll have to upgrade if you haven't already.

above-c-level avatar Dec 05 '18 02:12 above-c-level