amazon-sagemaker-examples
amazon-sagemaker-examples copied to clipboard
Notebook xgboost_customer_churn fails with XGBoost 1.5-1 and 1.3-1 Images
Link to the notebook xgboost_customer_churn
Describe the bug
Notebook xgboost_customer_churn fails during training, specifically in cell starting with line of code sess = sagemaker.Session()
. Looks like coding incompatibilities with the latest version(s).
Fails with v1.5-1 and v1.3-1 - same error (see below). Running with the older v1.2-1 gets through this issue, but may result in other issues.
To reproduce
Run the notebook to the training cell which fails, starting with sess = sagemaker.Session()
.
Logs If applicable, add logs to help explain your problem.
[2022-08-2S 15:23:59.038 ip-10-0-198-79.ec2.internal:1 INFO hook.py:200] tensorboard_dir has not been set for the hook. SNOeb ug will not be exporting tensorboard summaries.
[2022-08-25 15:23:59.031 ip 10 8 198 79.ec2.internal:1 INFO profiler_config_parser.py:102] User has disabled profiler.
[2022-08-25 15:23:59.032 ip-10-0-198-79.ec2.internal:1 INFO hook.py:255] Saving to /opt/ml/output/tensors
[2022-88-25 15:23:59.032 ip-10-0-198-79.ec2.internal:1 INFO state_store.py:77) The checkpoint config file /opt/ml/input/confi g/checkpointconfig.json does not exist. (2022-08-25:15:23:59:INF0) Debug hook created from config
[2022-88-25:15:23:59:ERROR] Reporting training FAILURE
[2022-08-25:15:23:59:ERROR] framework error: Traceback (most recent call last): File -/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/algorithm_mode/train.py", line 238, in trainjob bst xgb.train(train_cfg, train_dmatrix, num_boost_round-num_round-iteration, eyals-watchlist, File "/miniconda3/11b/python3.8/site-packages/xgboost/training.py", line 188, in train bst - _train_internal(params, dtrain, File -/miniconda3/lib/python3.8/site-packages/xgboost/training.py-, line 61, in _train_internal assert all(isinstance(c, callback.Trainingallback) AssertionError: You can't mix new and old callback styles. During handling of the above exception, another exception occurred:
@garystafford were you able to solve it ? as I am also facing same issue
v1.2-1 worked for that cell, but I also got odd issues with a null result set later in notebook. Might be unrelated.
We've isolated this down to XGB versions from 1.3.0 onwards. See: https://github.com/aws/sagemaker-xgboost-container/pull/301
The above PR should fix the issue but it'll be another week or so before the change is fully deployed to all regions. If you let me know which region you're operating in I can update you when the fix is deployed.
@mabunday thank you. We were running a large corporate hackathon in us-east-1 this last week when we ran into issue.
Hi, this change is being deployed and will be available in us-east-1 next Wednesday, September 7th. I'll write again on that day to confirm. Thanks for your patience.
Hi @mabunday! Is this change ready?
@JoseAJob Hi, change was just deployed to us-east-1. Thanks for your patience!