efficient_densenet_tensorflow
efficient_densenet_tensorflow copied to clipboard
`recompute_grad` Does Not Work
The method you propose for using recompute_grad
is not working, except for the simplest case where all layers in the model are recomputed except the input and output layers. All other cases (e.g. when every-other layer is recomputed) cause the following error:
ValueError: The variables used on recompute were different than the variables originally
used. The function wrapped with @recompute_grad likley creates its own variable
scope with a default name and has been called twice in the same enclosing scope.
To fix, ensure each call to the function happens in its own unique variable
scope.
Can you please advise how to fix this error?
My current method is to (1) create a memory efficient layer, e.g.:
def Conv2D_mem_eff( input_tensor,
filters,
kernel_size,
kernel_regularizer,
bias_regularizer,
padding,
name ):
with tf.variable_scope( name,
use_resource = True ):
def _x( inner_input_tensor ):
x = Conv2D( filters = filters,
kernel_size = kernel_size,
padding = padding,
kernel_regularizer = kernel_regularizer,
bias_regularizer = bias_regularizer,
name = name )(inner_input_tensor)
return x
_x = tf.contrib.layers.recompute_grad( _x )
return _x( input_tensor )
then (2) use this within a Lambda
layer when defining my model:
x = Lambda( Conv2D_mem_eff,
arguments = {'filters' : 24,
'kernel_size' : (5,5),
'kernel_regularizer' : l2,
'bias_regularizer' : l2,
'padding' : 'same',
'name' : 'conv02'},
name= 'conv02' )(x)
I give unique names for each layer I use.
Could you try defining the class outside the function and only use the instance in the function. I.e move Conv2D class instantiation outside of _x
func.
And if that’s doesn’t work set reuse=tf.AUTO_REUSE
in the variable scope.
@joeyearsley Thanks for your help. So you mean doing this, yes:
def Conv2D_mem_eff( input_tensor,
filters,
kernel_size,
kernel_regularizer,
bias_regularizer,
padding,
name ):
with tf.variable_scope( name,
use_resource = True ):
lyr_fn = Conv2D( filters = filters,
kernel_size = kernel_size,
padding = padding,
kernel_regularizer = kernel_regularizer,
bias_regularizer = bias_regularizer,
name = name )
def _x( inner_input_tensor ):
x = lyr_fn(inner_input_tensor)
return x
_x = tf.contrib.layers.recompute_grad( _x )
return _x( input_tensor )
The first option (instantiating Conv2D
outside _x
as shown above) gives me the same error:
Traceback (most recent call last):
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2639, in get_attr
c_api.TF_OperationGetAttrValueProto(self._c_op, name, buf)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Operation 'Hidden_Layers/FullyConnectedLayer_01/fc01/fc01/IdentityN' has no attr named '_XlaCompile'.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files\Python36\lib\contextlib.py", line 99, in __exit__
self.gen.throw(type, value, traceback)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 5652, in get_controller
yield g
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 398, in _MaybeCompile
xla_compile = op.get_attr("_XlaCompile")
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2643, in get_attr
raise ValueError(str(e))
ValueError: Operation 'Hidden_Layers/FullyConnectedLayer_01/fc01/fc01/IdentityN' has no attr named '_XlaCompile'.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main_run_0006.py", line 1854, in <module>
main()
File "main_run_0006.py", line 1785, in main
initial_epoch = initial_epoch)
File "C:\Program Files\Python36\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "C:\Program Files\Python36\lib\site-packages\keras\engine\training_generator.py", line 40, in fit_generator
model._make_train_function()
File "C:\Program Files\Python36\lib\site-packages\keras\engine\training.py", line 509, in _make_train_function
loss=self.total_loss)
File "C:\Program Files\Python36\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\keras\optimizers.py", line 475, in get_updates
grads = self.get_gradients(loss, params)
File "C:\Program Files\Python36\lib\site-packages\keras\optimizers.py", line 89, in get_gradients
grads = K.gradients(loss, params)
File "C:\Program Files\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 2757, in gradients
return tf.gradients(loss, variables, colocate_gradients_with_ops=True)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 158, in gradients
unconnected_gradients)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 731, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 403, in _MaybeCompile
return grad_fn() # Exit early
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 731, in <lambda>
lambda: grad_fn(op, *out_grads))
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\custom_gradient.py", line 236, in internal_grad_fn
return tape_grad_fn(*result_grads)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\custom_gradient.py", line 219, in tape_grad_fn
input_grads, variable_grads = grad_fn(*result_grads, variables=variables)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\contrib\layers\python\layers\rev_block_lib.py", line 629, in grad_fn
return _grad_fn(output_grads, kwargs["variables"])
File "C:\Program Files\Python36\lib\site-packages\tensorflow\contrib\layers\python\layers\rev_block_lib.py", line 622, in _grad_fn
has_is_recompute_kwarg=has_is_recompute_kwarg)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\contrib\layers\python\layers\rev_block_lib.py", line 553, in _recomputing_grad_fn
outputs = compute_fn(*inputs, **fn_kwargs)
File "main_run_0006.py", line 1094, in _x
x = lyr_fn(inner_input_tensor)
File "C:\Program Files\Python36\lib\site-packages\keras\engine\base_layer.py", line 474, in __call__
output_shape = self.compute_output_shape(input_shape)
File "C:\Program Files\Python36\lib\site-packages\keras\layers\core.py", line 888, in compute_output_shape
assert input_shape[-1]
AssertionError
Keeping everything the same and adding the argument reuse=tf.AUTO_REUSE
to tf.variable_scope
gives the original error from the beginning of the post:
Traceback (most recent call last):
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2639, in get_attr
c_api.TF_OperationGetAttrValueProto(self._c_op, name, buf)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Operation 'Hidden_Layers/FullyConnectedLayer_03/fc03/fc03/IdentityN' has no attr named '_XlaCompile'.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files\Python36\lib\contextlib.py", line 99, in __exit__
self.gen.throw(type, value, traceback)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 5652, in get_controller
yield g
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 398, in _MaybeCompile
xla_compile = op.get_attr("_XlaCompile")
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2643, in get_attr
raise ValueError(str(e))
ValueError: Operation 'Hidden_Layers/FullyConnectedLayer_03/fc03/fc03/IdentityN' has no attr named '_XlaCompile'.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main_run_0006.py", line 1850, in <module>
main()
File "main_run_0006.py", line 1781, in main
initial_epoch = initial_epoch)
File "C:\Program Files\Python36\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "C:\Program Files\Python36\lib\site-packages\keras\engine\training_generator.py", line 40, in fit_generator
model._make_train_function()
File "C:\Program Files\Python36\lib\site-packages\keras\engine\training.py", line 509, in _make_train_function
loss=self.total_loss)
File "C:\Program Files\Python36\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\keras\optimizers.py", line 475, in get_updates
grads = self.get_gradients(loss, params)
File "C:\Program Files\Python36\lib\site-packages\keras\optimizers.py", line 89, in get_gradients
grads = K.gradients(loss, params)
File "C:\Program Files\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 2757, in gradients
return tf.gradients(loss, variables, colocate_gradients_with_ops=True)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 158, in gradients
unconnected_gradients)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 731, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 403, in _MaybeCompile
return grad_fn() # Exit early
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 731, in <lambda>
lambda: grad_fn(op, *out_grads))
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\custom_gradient.py", line 236, in internal_grad_fn
return tape_grad_fn(*result_grads)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\custom_gradient.py", line 219, in tape_grad_fn
input_grads, variable_grads = grad_fn(*result_grads, variables=variables)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\contrib\layers\python\layers\rev_block_lib.py", line 629, in grad_fn
return _grad_fn(output_grads, kwargs["variables"])
File "C:\Program Files\Python36\lib\site-packages\tensorflow\contrib\layers\python\layers\rev_block_lib.py", line 622, in _grad_fn
has_is_recompute_kwarg=has_is_recompute_kwarg)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\contrib\layers\python\layers\rev_block_lib.py", line 556, in _recomputing_grad_fn
raise ValueError(_WRONG_VARS_ERR)
ValueError: The variables used on recompute were different than the variables originally
used. The function wrapped with @recompute_grad likley creates its own variable
scope with a default name and has been called twice in the same enclosing scope.
To fix, ensure each call to the function happens in its own unique variable
scope.
To my knowledge, albeit limited, I once read that the _XlaCompile
error has something to do with input and output tensor shapes not matching.
Can you share your script?
Yes, of course. Can I email it to you? It's rather large, and I'd prefer not to post it directly yet as it's for a class.
@joeyearsley I haven't heard back from you, so I'll assume you want me to post things here. I think part of my error was in trying to wrap the calls in Lambda
layers. Lambda
layers are inherently stateless, so I think naturally they have no trainable weights, so that was my mistake. In trying to wrap tf.contrib.recompute_grad
in a custom Keras
layer, I tried this, but it did not work:
class Conv2D_mem_eff(Conv2D):
def __init__(self,
filters,
kernel_size,
strides=1,
padding='valid',
data_format=None,
dilation_rate=1,
activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs):
super(Conv2D_mem_eff, self).__init__(
filters=filters,
kernel_size=kernel_size,
strides=strides,
padding=padding,
data_format=data_format,
dilation_rate=dilation_rate,
activation=activation,
use_bias=use_bias,
kernel_initializer=kernel_initializer,
bias_initializer=bias_initializer,
kernel_regularizer=kernel_regularizer,
bias_regularizer=bias_regularizer,
activity_regularizer=activity_regularizer,
kernel_constraint=kernel_constraint,
bias_constraint=bias_constraint,
**kwargs)
def call(self, inputs):
with tf.variable_scope( super(Conv2D_mem_eff, self).name,
use_resource = True ):
def _x( inner_input_tensor ):
x = super(Conv2D_mem_eff, self).call( inner_input_tensor )
return x
_x = tf.contrib.layers.recompute_grad( _x )
return _x( inputs )
I don't know what I'm doing wrong and how others are getting this to work. Can you help me?
@joeyearsley Do you have a minimal working example I can try to run to get it working? Also, what versions of tensorflow and keras did you use when testing your code? TF 1.9? Keras 2.0? I know I need to use the tensorflow implementations of keras backend (i.e. from tensorflow.python.keras import backend as K
) instead of the backed with keras (i.e. from keras import backend as K
), but beyond I don't know.
@Sirius083 or @joeyearsley can you help me?
@Sirius083 @joeyearsley I used tensorflow.layers
instead of tensorflow.keras.layers
to import core layers (e.g. Conv2D
and Dense
) and I get the warning:
WARNING:tensorflow:@custom_gradient grad_fn has 'variables' in signature, but no ResourceVariables were used on the forward pass.
Is that correct?
CAN SOMEONE PLEASE HELP?!!!
@Sirius083 or @joeyearsley can you help me?
I use another effcient densenet implementation at https://github.com/cybertronai/gradient-checkpointing easy to implement, just add a few lines at the begining of your code.
@Sirius083 That was the first one I tried, but it didn't work. What version of tensorflow and keras did you use?
@Sirius083 Did you see that your gpu memory went down and training time per second went up when you used yaroslav's memory_saving_gradients? Also, are you using Windows or Linux?
@Sirius083 Did you see that your gpu memory went down and training time per second went up when you used yaroslav's memory_saving_gradients? Also, are you using Windows or Linux?
- Yes, I cannot train densenet-100-36 without using yaroslav's memory_saving_gradients since I only have one 1080-ti GPU.
- It works on both Windows and Linux I checked.
Note: Although tensorflow has an intrinsic implementation of the memory saving method I think, since it will give out CUDA MEMORY ALLODATED FAILED but can still training the model.
However if the model is too big, like densenet-bc-190-40, the model cannot be trained under this method.
@Sirius083 I've tried downgrading from tf-1.8 to 1.5 and still can't get it to work. I'm on Windows 10 and my task manager doesn't show any less memory being utilized when I use memory_saving_gradients
.
Right now, I am on tensorflow 1.5 with keras 2.1.6 using python 3.5 x64-bit. I make sure to use the tensorflow implementation of keras backend (from tensorflow.python.keras._impl.keras import backend as K
) as well as the tensorflow keras backend modules for keras layers.
I define my model, add gradient checkpointing for several convolutional and fully-connected layers, then compile the model in a function called get_model
.
Here is all my code. I haven't put down a bunch of my pandas
functions for dataset manipulations, but if for some reason you think they'd be important let me know and I'll post them here. Here is the meat of my code. I don't feel like I'm doing anything too out of the ordinary. Can you take a look?
import tensorflow as tf
from tensorflow.python.keras._impl.keras import backend as K
from tensorflow.contrib.data.python.ops.shuffle_ops import shuffle_and_repeat
from tensorflow.contrib.data.python.ops.batching import map_and_batch
import memory_saving_gradients
Dataset = tf.data.Dataset
from tensorflow.python.keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array
from tensorflow.python.keras.models import Sequential, Model, load_model, model_from_yaml
from tensorflow.python.keras.callbacks import LearningRateScheduler, ModelCheckpoint, EarlyStopping, History, TensorBoard
from tensorflow.python.keras import regularizers, optimizers
from tensorflow.python.keras.layers import Conv2D, Dense, Flatten, Dropout, Input, Lambda, Activation
##################
#GLOBAL VARIABLES
##################
img_shape_raw = (3, 160, 320)
batch_size = 32
num_epochs = 1
crop_top = 70
crop_btm = 25
img_format = 'channels_first'
K.set_image_data_format(img_format)
img_shape_input = (img_shape_raw[0],
img_shape_raw[1] - crop_top - crop_btm,
img_shape_raw[2]) #(3, 65, 320)
################
#DATA GENERATOR
################
def generator_from_df( df, batch_size, shuffle = True ):
def read( img_pth, angle ):
im_fl = tf.read_file( img_pth )
im = tf.image.decode_image(im_fl, channels=3)
im = tf.transpose( im, [2, 0, 1] ) # Make image channels first
return Dataset.from_tensors( (im, angle) )
img_pths = tf.convert_to_tensor( df['Image_Path'].values )
angs = tf.convert_to_tensor( df['Angle'].values )
ds = Dataset.from_tensor_slices( (img_pths, angs) )
ds = ds.apply( tf.contrib.data.parallel_interleave( read, cycle_length = batch_size, sloppy = True ) )
if shuffle:
ds = ds.apply( shuffle_and_repeat( buffer_size = 2*batch_size, count = num_epochs ) )
else:
ds = ds.repeat( num_epochs )
ds = ds.apply( map_and_batch(
lambda img_pth, ang: (img_pth,ang),
batch_size,
num_parallel_batches = max_procs ) )
ds = ds.prefetch( max_procs )
iterator = ds.make_one_shot_iterator()
sess = K.get_session()
next_element = iterator.get_next()
while True:
try:
yield sess.run(next_element)
except tf.errors.OutOfRangeError:
break
###########
#GET MODEL
###########
def get_model( lr ):
keep_prob = 0.5
rate = keep_prob
l2 = regularizers.l2(0.001)
with tf.name_scope('Input'):
inputs = Input( shape=img_shape_input, name='input' )
x = Lambda(lambda x: x / 255. - 0.5,
input_shape=img_shape_input, name = 'norm_-0.5_to_0.5')(inputs)
with tf.name_scope('Hidden_Layers'):
with K.name_scope('ConvLayer_01'):
x = Conv2D(4, (5,5),
kernel_regularizer=l2,
bias_regularizer=l2,
padding='same',
name='conv01')(x)
with tf.name_scope('ConvLayer_02'):
x = Conv2D(12, (5,5),
kernel_regularizer=l2,
bias_regularizer=l2,
padding='same',
name='conv02')(x)
with tf.name_scope('ConvLayer_03'):
x = Conv2D(24, (5,5),
kernel_regularizer=l2,
bias_regularizer=l2,
padding='same',
name='conv03')(x)
with tf.name_scope('ConvLayer_04'):
x = Conv2D(24, (3,3),
kernel_regularizer=l2,
bias_regularizer=l2,
padding='same',
name='conv04')(x)
with tf.name_scope('ConvLayer_05'):
x = Conv2D(32, (3,3),
kernel_regularizer=l2,
bias_regularizer=l2,
padding='same',
name='conv05')(x)
with tf.name_scope('Flatten'):
x = Flatten(name='flatten')(x)
with tf.name_scope('FullyConnectedLayer_01'):
x = Dense(100,
kernel_regularizer=l2,
bias_regularizer=l2,
name='fc01')(x)
with tf.name_scope('FullyConnectedLayer_02'):
x = Dense(50,
kernel_regularizer=l2,
bias_regularizer=l2,
name='fc02')(x)
with tf.name_scope('FullyConnectedLayer_03'):
x = Dense(25,
kernel_regularizer=l2,
bias_regularizer=l2,
name='fc03')(x)
with tf.name_scope('FullyConnectedLayer_04'):
x = Dense(10,
kernel_regularizer=l2,
bias_regularizer=l2,
name='fc04')(x)
with tf.name_scope('Output'):
outputs = Dense(1,
name='output')(x)
# Create Model
model = Model( inputs = inputs, outputs = outputs )
adam = optimizers.Adam( lr = lr, decay = 0.001 ) # Learning rate and decay set in LearningRateScheduler
# Memory Saving Gradients
layer_names = [ 'conv02', 'conv04', 'fc01', 'fc03' ]
[tf.add_to_collection('checkpoints', model.get_layer(l).get_output_at(0))
for l in layer_names]
K.__dict__['gradients'] = memory_saving_gradients.gradients_collection
# Compile Model
model.compile(loss='mean_squared_error', optimizer=adam, metrics=['mse'])
return model
class CumulativeHistory( History ):
'''
History does not allow resume history, but this does.
'''
def on_train_begin( self, logs=None ):
if not hasattr(self, 'epoch'):
super(CumulativeHistory, self).on_train_begin( logs )
def main(*args, **kargs):
""" Behavioral Cloning Project
"""
parser = argparse.ArgumentParser(description='Behavioral Cloning Project')
parser.add_argument('-c', '--checkpoint', type=str, help='Checkpoint (`.h5` file)')
parser.add_argument('-e', '--epoch', type=int, help='Initial epoch')
args = parser.parse_args()
model_type = 'new'
train_model = None
initial_epoch = 0
if args.checkpoint is not None:
train_model = load_model( args.checkpoint )
initial_epoch = args.epoch
model_type = 'loaded'
# Set Configuration
config = tf.ConfigProto( intra_op_parallelism_threads = max_procs,
inter_op_parallelism_threads = 0) # set automatically to number of logical cores
config.gpu_options.allow_growth = True
# Get Data
df_train, df_val, df_test, bins = get_data( keep_ptl = 60 )
ntrain, nval, ntest = df_train.shape[0], df_val.shape[0], df_test.shape[0]
# Training
train_graph = tf.Graph()
train_generator = generator_from_df( df_train, batch_size )
val_generator = generator_from_df( df_val, batch_size, shuffle=False )
nbatches_train = ntrain // batch_size
nbatches_val = nval // batch_size
history = CumulativeHistory()
early_stop = EarlyStopping( monitor='val_mean_squared_error',
min_delta=1e-4,
patience=50,
verbose=0,
mode='min')
model_ckpt = ModelCheckpoint( fl_fmt_wt_ckpt,
monitor='val_mean_squared_error',
verbose=0,
save_best_only=True,
save_weights_only=True,
period=1)
callbacks = [history, early_stop, model_ckpt]
for i in range(len(lr)):
train_sess = tf.Session( config = config, graph = train_graph )
K.set_session( train_sess )
if model_type == 'new':
with train_graph.as_default():
# Print model summary
summary_fl_pth = os.path.join( fldr_summary, 'model_summary_run_{:04d}_'.format(run[0]) + r'.txt' )
train_model = get_model( lr[i], is_training = True )
with open(summary_fl_pth, 'w') as summary_file:
train_model.summary( print_fn=lambda x: summary_file.write(x + '\n') )
with train_graph.as_default():
with train_sess.as_default():
if K.backend() == 'tensorflow':
board = TensorBoard( log_dir = fldr_log,
histogram_freq = 0,
write_graph = True,
write_images = True )
callbacks.append( board )
writer = tf.summary.FileWriter( fldr_log, train_graph )
ts = time.time()
ts = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d_%H-%M-%S')
arch_yaml = train_model.to_yaml()
arch_fl_pth = os.path.join( fldr_arch, 'arch_' + hparam_str[0] + '_run_{:04d}_'.format(run[0]) + ts + '.yaml' )
with open(arch_fl_pth, 'w') as arch_file:
arch_file.write( arch_yaml )
train_model.save( os.path.join( fldr_mdl,
'model_init_' + hparam_str[0] + '_run_{:04d}_'.format(run[0]) + ts + '.h5') )
train_model.save_weights( os.path.join( fldr_wt,
'weights_init_' + hparam_str[0] + '_run_{:04d}_'.format(run[0]) + ts + '.h5' ) )
train_model.fit_generator(
generator = train_generator,
steps_per_epoch = nbatches_train,
epochs = num_epochs,
max_queue_size = max_q_size,
validation_data = val_generator,
validation_steps = nbatches_val,
workers = 0,
callbacks = callbacks,
initial_epoch = initial_epoch)
ts = time.time()
ts = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d_%H-%M-%S')
train_model.save( os.path.join( fldr_mdl,
'model_final_' + hparam_str[0] + '_run_{:04d}_'.format(run[0]) + ts + '.h5') )
train_model.save_weights( os.path.join( fldr_wt,
'weights_final_' + hparam_str[0] + '_run_{:04d}_'.format(run[0]) + ts + '.h5' ) )
if K.backend() == 'tensorflow':
K.clear_session()
del train_model
gc.collect()
if __name__ == '__main__':
""" Entry point to the program
"""
main()
@gitrdonator sorry I did not use keras, I use tensorflow(1.9.0 on windows) and python 3.6 I just add these lines before the model defination and training part.
I think the problem may be you should first import tensorflow , than overwrite the gradient_memory function as below
import sys
import os
import numpy as np
import tensorflow as tf
import memory_saving_gradients
from tensorflow.python.ops import gradients
def gradients_memory(ys, xs, grad_ys=None, **kwargs):
return memory_saving_gradients.gradients(ys, xs, grad_ys, checkpoints='memory', **kwargs)
gradients.__dict__["gradients"] = gradients_memory
import argparse
import os
import math
from tensorpack import *
from tensorpack.tfutils.symbolic_functions import *
from tensorpack.tfutils.summary import *
@Sirius083 I tried that before as well, but that didn't work. However, I tried just now using your method with gradients checkpointing
to see if it would work, but it still didn't. I modified my code as follows:
In imports, ensure memory_saving_gradients
comes after tensorflow
, which it did before, but add from tensorflow.python.ops import gradients
after memory_saving gradients
:
...
import memory_saving_gradients
from tensorflow.python.ops import gradients
...
Then modify the end of my get_model
function as follows:
...
layer_names = [ 'conv02', 'conv04', 'fc01', 'fc03' ]
[tf.add_to_collection('checkpoints', model.get_layer(l).get_output_at(0))
for l in layer_names]
def gradients_collection(ys, xs, grad_ys=None, **kwargs):
return memory_saving_gradients.gradients(ys, xs, grad_ys, checkpoints='collection', **kwargs)
gradients.__dict__["gradients"] = gradients_collection
...
but this still didn't work.
Using gradients_memory
normally with the keras backend (K.dict
) instead of the tensorflow ops gradients (gradients.dict
) does tell me it can't find a bottleneck and I should used checkpointing, whereas it does not with gradients.dict
.
Do you by chance have any other ideas?
@joeyearsley Since this does not work, particularly with Keras as far as I can tell, can you please update your README.md
to state that this does not work with keras
@Sirius083 Can you share your tensorflow code with me? I desparately need to get memory saving to work in Windows, and I can't get it to work using keras.
@gitrdonator I just add the few lines before in this code in cifar10-densenet.py https://github.com/YixuanLi/densenet-tensorflow I did not perticular add anything else , which means I add gradient checkpoiting to all the convolutional layer, not just a few specified layers may be you can open a issue under this repository https://github.com/cybertronai/gradient-checkpointing
@Sirius083 I've opened many. You're more than welcome to look.
@Sirius083 So, to be clear, what you're telling me is that you don't in fact have any working code?
@joeyearsley @Sirius083 Can you please help me to get memory_saving_gradients
in tensorflow-gpu 1.5 working? You had said previously that you got it to work in tensorflow 1.5.
Would you mind please taking a look at Issue #42 I created on cybertonai/gradient-checkpointing
? Thank you.
@gitrdonator I said it works on tensorflow 1.9 on windows , I never tried it on tensorflow 1.5.
@Sirius083 Sorry, that was for @joeyearsley. He had said he had it "successfully working" in an issue post on cybertronai/gradient-checkpointing
.
@Sirius083 But while I have you here, would you mind taking a look at the issue and letting me know if you see anything that I could do?
@Sirius083 You did not even import memory_saving_gradients
in your code. In fact, even if you did get it to work, you should let @joeyearsley know. He opened a giant issue with many others showing that they couldn't get it to work past tensorflow 1.8 (see (Issue #29).