sagemaker-101-workshop
sagemaker-101-workshop copied to clipboard
MNIST model CPU training broken in TF v2.7 (conda_tensorflow2_p37 kernel on NBI ALv2 JLv3)
The current conda_tensorflow2_p38
kernel on the latest SageMaker Notebook Instance platform (notebook-al2-v2
, as used in the CFn template) seems to break local CPU-only training for the MNIST migration challenge.
In this environment (TF v2.7.1, TF.Keras v2.7.0), tensorflow.keras.backend.image_data_format()
asks for channels_first
, but training fails because MaxPoolingOp only supports channels_last on CPU - per the error message below:
InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU
[[node sequential/max_pooling2d/MaxPool
(defined at /home/ec2-user/anaconda3/envs/tensorflow2_p38/lib/python3.8/site-packages/keras/layers/pooling.py:357)
]] [Op:__inference_train_function_862]
Errors may have originated from an input operation.
Input Source operations connected to node sequential/max_pooling2d/MaxPool:
In[0] sequential/conv2d_1/Relu (defined at /home/ec2-user/anaconda3/envs/tensorflow2_p38/lib/python3.8/site-packages/keras/backend.py:4867)
Overriding the image_data_format()
check (in "Pre-Process the Data for our CNN") to prepare data in different shape does not work because the model is incompatible (will raise ValueError in conv2d_2).
Still seems to be working fine in current SMStudio kernel (TensorFlow v2.3.2, TF.Keras v2.4.0).