sagemaker-debugger
sagemaker-debugger copied to clipboard
tf.keras saves step at end of batch
Running the following script with tensorflow==1.15.0:
import tensorflow.compat.v2 as tf
import smdebug.tensorflow as smd
from tempfile import TemporaryDirectory
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255, x_test / 255
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax'),
])
with TemporaryDirectory() as dirpath:
hook = smd.KerasHook(out_dir=dirpath)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, callbacks=[hook])
trial = smd.create_trial(path=dirpath)
print(hook)
print(trial)
gives the following output:
<smdebug.tensorflow.keras.KerasHook object at 0x1025aaed0>:(
out_dir=/var/folders/r1/mgxfss8d45jbs_vl464bbsg906jznv/T/tmpdzybvlqg,
tensorboard_dir=None,
step=9374,
mode=ModeKeys.TRAIN,
mode_steps={<ModeKeys.GLOBAL: 4>: 9374, <ModeKeys.TRAIN: 1>: 9374},
include_collections=['metrics', 'losses', 'sm_metrics'],
writer=None,
save_config=<class SaveConfig: {<ModeKeys.TRAIN: 1>: <class SaveConfig: save_interval=500, save_steps=[], start_step=0, end_step=None>, <ModeKeys.EVAL: 2>: <class SaveConfig: save_interval=500, save_steps=[], sta ...>,
reduction_config=<class ReductionConfig: reductions=[], abs_reductions=[], norms=[], abs_norms=[]>,
save_all=False,
dry_run=False,
)
<smdebug.trials.local_trial.LocalTrial object at 0x1025b0f50>:(
name=tmpdzybvlqg,
path=/var/folders/r1/mgxfss8d45jbs_vl464bbsg906jznv/T/tmpdzybvlqg,
steps=[0, 500, 1000, 1500, 1874, 2000, 2500, 3000, 3500, 3749, 4000, 4500, 5000, 5500, 5624, 6000, 6500, 7000, 7499, 7500, 8000, 8500, 9000, 9374],
collections=['default', 'weights', 'biases', 'gradients', 'losses', 'metrics', 'inputs', 'outputs', 'all', 'sm_metrics'],
tensor_names=['acc', 'batch', 'loss', 'size'],
)
It appears to be saving every 1874th step, in addition to every 500th. Is this desired behavior?
can you check mode and mode step of saved global steps?
This is probably the last step in an epoch. We save additional metrics which Keras only gives us at the end of epoch at that point