evalml
evalml copied to clipboard
AutoMLSearch execution leads to Segmentation Fault
[A clear and concise description of what the bug is.]
PROBLEM:
AutoMLSearch execution leads to Segmentation Fault Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
Code Sample, a copy-pastable example to reproduce your bug.
Environment: (serverless-machine-learning) akram@ISHERIFF-M-RBNA models % uname -a Darwin ISHERIFF-M-RBNA 21.5.0 Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:22 PDT 2022; root:xnu-8020.121.3~4/RELEASE_X86_64 x86_64 (serverless-machine-learning) akram@ISHERIFF-M-RBNA models % (serverless-machine-learning) akram@ISHERIFF-M-RBNA models % python3 -V Python 3.9.7 (serverless-machine-learning) akram@ISHERIFF-M-RBNA models %
# Your code here
## Evaluating Different Models by using the Auto-ML framework ""EVALML"" in this module.
print("\nImporting to Auto-ML based Training ...##")
import evalml ## AutoML technique to be used here This package is required only if you are doing automatic Data cleaning and Pre-processing without any Manual steps.
from PreProcess_Data import Xtrain,Xtest,Ytrain,Ytest
from evalml import AutoMLSearch
evalml.problem_types.ProblemTypes.all_problem_types
from sklearn.metrics import accuracy_score
from matplotlib import pyplot as plt
X_train, X_test, y_train, y_test = Xtrain,Xtest,Ytrain,Ytest
print("\n\n\tRunning Auto ML based training\n")
automl = AutoMLSearch(X_train=Xtrain, y_train=Ytrain, problem_type='binary')
print(automl.search())
automl.rankings
print(automl.best_pipeline)
best_pipeline=automl.best_pipeline
print(best_pipeline)
#GeneratedPipeline(parameters={'Imputer':{'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean',
# 'categorical_fill_value': None, 'numeric_fill_value': None}, 'Logistic Regression Classifier'
#:{'penalty': 'l2', 'C': 1.0, 'n_jobs': -1, 'multi_class': 'auto', 'solver': 'lbfgs'},})
automl.describe_pipeline(automl.rankings.iloc[0]["id"])
### Evaluate on hold out of the data samples
best_pipeline.score(X_test, y_test, objectives=["auc","f1","Precision","Recall"])
automl_auc.rankings
automl_auc.describe_pipeline(automl_auc.rankings.iloc[0]["id"])
best_pipeline_auc = automl_auc.best_pipeline
# get the score on holdout data
best_pipeline_auc.score(X_test, y_test, objectives=["auc"])
## Pickling the trained model
best_pipeline.save("AutomML_Eval_model.pkl")
check_model=automl.load('model.pkl')
check_model.predict_proba(X_test).to_dataframe()
Debugged it with pdb as well and with breakpoints, print statements
================================================================ OUTPUT:
/Users/akram/opt/anaconda3/envs/serverless-machine-learning/bin/python /Users/akram/AKRAM_CODE_FOLDER/ML/Washington_ML/serverless-machine-learning/ML_Proj_Template/ml1/models/Auto_Eval_Training.py
Importing to Auto-ML based Training ...## ::Reading of Input Data is Sucessfull::
MI_dir_L5_weight MI_dir_L5_mean ... HpHp_L0.01_covariance HpHp_L0.01_pcc
0 1.000000 60.000000 ... 0.000000e+00 0.000000e+00 1 1.000000 60.000000 ... 0.000000e+00 0.000000e+00 2 1.000000 60.000000 ... 0.000000e+00 0.000000e+00 3 1.000000 590.000000 ... 0.000000e+00 0.000000e+00 4 1.927179 590.000000 ... 0.000000e+00 0.000000e+00 ... ... ... ... ... ... 9994 1.000000 330.000000 ... 4.240000e-29 0.000000e+00 9995 1.998594 330.000000 ... -1.110000e-28 -3.820000e-18 9996 1.000000 60.000016 ... 1.240000e-28 1.110000e-16 9997 1.000000 330.000000 ... 2.530000e-29 1.740000e-18 9998 1.999917 330.000000 ... -6.640000e-29 -4.560000e-18
[9999 rows x 115 columns] MI_dir_L5_weight MI_dir_L5_mean ... HpHp_L0.01_covariance HpHp_L0.01_pcc 0 1.000000 60.0 ... 0.0 0.0 1 1.000000 60.0 ... 0.0 0.0 2 1.000000 60.0 ... 0.0 0.0 3 1.000000 590.0 ... 0.0 0.0 4 1.927179 590.0 ... 0.0 0.0
[5 rows x 115 columns] The shape of Input dataset is : (9999, 115) The shape of Input malicious dataset is : (9999, 115) Clean/ Benign Traffic is 0 1 1 1 2 1 3 1 4 1 .. 9994 1 9995 1 9996 1 9997 1 9998 1 Name: Out, Length: 9999, dtype: int64 Malicious Traffic is 0 0 1 0 2 0 3 0 4 0 .. 9994 0 9995 0 9996 0 9997 0 9998 0 Name: Out, Length: 9999, dtype: int64 Concatenated Data Shape is (19998, 116) combined1 shape is (19998, 116) After remove: (19998, 114)
The OUTPUT is : [0 1 1 ... 0 0 1]
OUTPUT SHAPE : (19998,)
Running Auto ML based training
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
To confirm, you're running on a Mac (non-M1), correct?
It seems that the code snippet that you provided is incomplete. With a few minor corrections, I can run it with no errors.
- How is
automl_auc
assigned? - What is
model.pkl
? - Is
plt
called at some point? - Have you tried running it with a different dataset?
Yes,I am on a MAC.
Hardware Overview:
Model Name: MacBook Pro Model Identifier: MacBookPro16,1 Processor Name: 6-Core Intel Core i7 Processor Speed: 2.6 GHz Number of Processors: 1 Total Number of Cores: 6 L2 Cache (per Core): 256 KB L3 Cache: 12 MB Hyper-Threading Technology: Enabled Memory: 16 GB System Firmware Version: 1731.140.2.0.0 (iBridge: 19.16.16064.0.0,0) OS Loader Version: 540.120.3~19 Serial Number (system): C02FRBNAMD6M Hardware UUID: B5D170ED-BA36-541D-81D0-2CB5FD5B0A39 Provisioning UDID: B5D170ED-BA36-541D-81D0-2CB5FD5B0A39 Activation Lock Status: Disabled
- By using the confusion matrix API Call as below oncode lines #220 - 222
confusion_matrix = metrics.confusion_matrix(max_test, max_predictions) plt.figure(figsize=(16, 14)) sns.heatmap(confusion_matrix, xticklabels=LABELS, yticklabels=LABELS, annot=True, fmt="d");
-
Trained model is saved (pickled) using the model.pkl api call and later on this same model is loaded into memory to make the predictions.
-
Yes, in code line #203 plt is called for plotting the results.
-
Yes, i tried with a different dummy dataset which comes from SKlearn.load_dataset API but the same error.
Thanks for the info! A few of things:
- Have you tried running your code with the plot operations commented out?
- I am not seeing any code that has the line numbers you're referencing. There are only about 50 lines in the snippet
- Could you enable faulthandler in your module? It might provide more insight as to the code that is triggering the segfault
- Yes, i tried running with the plot operations commented out but i still see the same issue.
- PFI the code below.
Your code here
Evaluating Different Models by using the Auto-ML framework ""EVALML"" in this module.
print("\nImporting to Auto-ML based Training ...##")
import evalml ## AutoML technique to be used here This package is required only if you are doing automatic Data cleaning and Pre-processing without any Manual steps. from PreProcess_Data import Xtrain,Xtest,Ytrain,Ytest from evalml import AutoMLSearch evalml.problem_types.ProblemTypes.all_problem_types
from sklearn.metrics import accuracy_score from matplotlib import pyplot as plt
X_train, X_test, y_train, y_test = Xtrain,Xtest,Ytrain,Ytest
print("\n\n\tRunning Auto ML based training\n")
automl = AutoMLSearch(X_train=Xtrain, y_train=Ytrain, problem_type='binary') print(automl.search())
automl.rankings print(automl.best_pipeline)
best_pipeline=automl.best_pipeline
print(best_pipeline)
#GeneratedPipeline(parameters={'Imputer':{'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean',
'categorical_fill_value': None, 'numeric_fill_value': None}, 'Logistic Regression Classifier'
#:{'penalty': 'l2', 'C': 1.0, 'n_jobs': -1, 'multi_class': 'auto', 'solver': 'lbfgs'},})
automl.describe_pipeline(automl.rankings.iloc[0]["id"])
Evaluate on hold out of the data samples
best_pipeline.score(X_test, y_test, objectives=["auc","f1","Precision","Recall"])
automl_auc.rankings automl_auc.describe_pipeline(automl_auc.rankings.iloc[0]["id"])
best_pipeline_auc = automl_auc.best_pipeline
get the score on holdout data
best_pipeline_auc.score(X_test, y_test, objectives=["auc"])
Pickling the trained model
best_pipeline.save("AutomML_Eval_model.pkl")
check_model=automl.load('model.pkl') check_model.predict_proba(X_test).to_dataframe()
That looks like the same code as your original snippet. No line 202 or 220 - 222