amazon-sagemaker-examples icon indicating copy to clipboard operation
amazon-sagemaker-examples copied to clipboard

Notebook xgboost_customer_churn fails with XGBoost 1.5-1 and 1.3-1 Images

Open garystafford opened this issue 2 years ago • 7 comments

Link to the notebook xgboost_customer_churn

Describe the bug Notebook xgboost_customer_churn fails during training, specifically in cell starting with line of code sess = sagemaker.Session(). Looks like coding incompatibilities with the latest version(s).

Fails with v1.5-1 and v1.3-1 - same error (see below). Running with the older v1.2-1 gets through this issue, but may result in other issues.

To reproduce Run the notebook to the training cell which fails, starting with sess = sagemaker.Session().

Logs If applicable, add logs to help explain your problem.

[2022-08-2S 15:23:59.038 ip-10-0-198-79.ec2.internal:1 INFO hook.py:200] tensorboard_dir has not been set for the hook. SNOeb ug will not be exporting tensorboard summaries.
[2022-08-25 15:23:59.031 ip 10 8 198 79.ec2.internal:1 INFO profiler_config_parser.py:102] User has disabled profiler.
[2022-08-25 15:23:59.032 ip-10-0-198-79.ec2.internal:1 INFO hook.py:255] Saving to /opt/ml/output/tensors
[2022-88-25 15:23:59.032 ip-10-0-198-79.ec2.internal:1 INFO state_store.py:77) The checkpoint config file /opt/ml/input/confi g/checkpointconfig.json does not exist. (2022-08-25:15:23:59:INF0) Debug hook created from config
[2022-88-25:15:23:59:ERROR] Reporting training FAILURE
[2022-08-25:15:23:59:ERROR] framework error: Traceback (most recent call last): File -/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/algorithm_mode/train.py", line 238, in trainjob bst xgb.train(train_cfg, train_dmatrix, num_boost_round-num_round-iteration, eyals-watchlist, File "/miniconda3/11b/python3.8/site-packages/xgboost/training.py", line 188, in train bst - _train_internal(params, dtrain, File -/miniconda3/lib/python3.8/site-packages/xgboost/training.py-, line 61, in _train_internal assert all(isinstance(c, callback.Trainingallback) AssertionError: You can't mix new and old callback styles. During handling of the above exception, another exception occurred: 

garystafford avatar Aug 25 '22 16:08 garystafford

@garystafford were you able to solve it ? as I am also facing same issue

karan6190 avatar Aug 26 '22 13:08 karan6190

v1.2-1 worked for that cell, but I also got odd issues with a null result set later in notebook. Might be unrelated.

garystafford avatar Aug 26 '22 22:08 garystafford

We've isolated this down to XGB versions from 1.3.0 onwards. See: https://github.com/aws/sagemaker-xgboost-container/pull/301

The above PR should fix the issue but it'll be another week or so before the change is fully deployed to all regions. If you let me know which region you're operating in I can update you when the fix is deployed.

mabunday avatar Aug 27 '22 00:08 mabunday

@mabunday thank you. We were running a large corporate hackathon in us-east-1 this last week when we ran into issue.

garystafford avatar Aug 27 '22 15:08 garystafford

Hi, this change is being deployed and will be available in us-east-1 next Wednesday, September 7th. I'll write again on that day to confirm. Thanks for your patience.

mabunday avatar Sep 01 '22 17:09 mabunday

Hi @mabunday! Is this change ready?

JoseAJob avatar Sep 08 '22 12:09 JoseAJob

@JoseAJob Hi, change was just deployed to us-east-1. Thanks for your patience!

mabunday avatar Sep 08 '22 15:09 mabunday