sagemaker-debugger icon indicating copy to clipboard operation
sagemaker-debugger copied to clipboard

tf.keras saves step at end of batch

Open jarednielsen opened this issue 5 years ago • 2 comments

Running the following script with tensorflow==1.15.0:

import tensorflow.compat.v2 as tf
import smdebug.tensorflow as smd
from tempfile import TemporaryDirectory

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255, x_test / 255

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='softmax'),
])

with TemporaryDirectory() as dirpath:
    hook = smd.KerasHook(out_dir=dirpath)

    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    model.fit(x_train, y_train, epochs=5, callbacks=[hook])

    trial = smd.create_trial(path=dirpath)
    print(hook)
    print(trial)

gives the following output:

<smdebug.tensorflow.keras.KerasHook object at 0x1025aaed0>:(
    out_dir=/var/folders/r1/mgxfss8d45jbs_vl464bbsg906jznv/T/tmpdzybvlqg,
    tensorboard_dir=None,
    step=9374,
    mode=ModeKeys.TRAIN,
    mode_steps={<ModeKeys.GLOBAL: 4>: 9374, <ModeKeys.TRAIN: 1>: 9374},
    include_collections=['metrics', 'losses', 'sm_metrics'],
    writer=None,
    save_config=<class SaveConfig: {<ModeKeys.TRAIN: 1>: <class SaveConfig: save_interval=500, save_steps=[], start_step=0, end_step=None>, <ModeKeys.EVAL: 2>: <class SaveConfig: save_interval=500, save_steps=[], sta ...>,
    reduction_config=<class ReductionConfig: reductions=[], abs_reductions=[], norms=[], abs_norms=[]>,
    save_all=False,
    dry_run=False,
)
<smdebug.trials.local_trial.LocalTrial object at 0x1025b0f50>:(
    name=tmpdzybvlqg,
    path=/var/folders/r1/mgxfss8d45jbs_vl464bbsg906jznv/T/tmpdzybvlqg,
    steps=[0, 500, 1000, 1500, 1874, 2000, 2500, 3000, 3500, 3749, 4000, 4500, 5000, 5500, 5624, 6000, 6500, 7000, 7499, 7500, 8000, 8500, 9000, 9374],
    collections=['default', 'weights', 'biases', 'gradients', 'losses', 'metrics', 'inputs', 'outputs', 'all', 'sm_metrics'],
    tensor_names=['acc', 'batch', 'loss', 'size'],
)

It appears to be saving every 1874th step, in addition to every 500th. Is this desired behavior?

jarednielsen avatar Jan 11 '20 00:01 jarednielsen

can you check mode and mode step of saved global steps?

Vikas-kum avatar Jan 11 '20 02:01 Vikas-kum

This is probably the last step in an epoch. We save additional metrics which Keras only gives us at the end of epoch at that point

rahul003 avatar Jan 17 '20 01:01 rahul003