benchmarks icon indicating copy to clipboard operation
benchmarks copied to clipboard

[Feature request] adding support for "iter_size" like hyperparameters (Caffe)

Open qinglintian opened this issue 8 years ago • 4 comments

Hi, thanks a lot for sharing this awesome project.

I wonder if the code currently support the Caffe "iter_size" like hyperparameter? That is, accumulating gradients for "iter_size" number of batches and then apply the gradient. By using this hyperparameter, one can emulate the training with larger batch_size without distributed training. When the bathc_size is set to, let's say 64, and iter_size set to ITER_SIZE, then the effective batch_size will be 64*ITER_SIZE since all the gradients in ITER_SIZE batches are accumulated.

Is this doable in current code? Is there any plan for supporting this feature?

Thank you.

qinglintian avatar Sep 04 '17 06:09 qinglintian

That is a cool trick. I am not aware of a plan right now but as we start working with converging FP16 this might be something that is done to test larger batch sizes. I will leave this open so everyone on the perf team can see it. If we do something, I will try to update the ticket and if not close it in 30-60 days.

tfboyd avatar Sep 05 '17 15:09 tfboyd

This can be done by writing a new tf.train.Optimizer. I have one here which at least works for common cases. (and you can probably copy-paste and use)

ppwwyyxx avatar Sep 05 '17 16:09 ppwwyyxx

@ppwwyyxx TensorPack has everything. :-) I like looking through your code and examples.

tfboyd avatar Sep 05 '17 16:09 tfboyd

@tfboyd thanks for your reply. If supporting the case is easy with what @ppwwyyxx has now, I would hope this feature be supported soon. I'm also looking at the Optimize provided above and not sure I can manage this. I'm quite a newbie in TF and python, and the code in this repo is an excellent example for me to dig in. Thanks again!

qinglintian avatar Sep 06 '17 01:09 qinglintian