benchmarks
benchmarks copied to clipboard
[Feature request] adding support for "iter_size" like hyperparameters (Caffe)
Hi, thanks a lot for sharing this awesome project.
I wonder if the code currently support the Caffe "iter_size" like hyperparameter? That is, accumulating gradients for "iter_size" number of batches and then apply the gradient. By using this hyperparameter, one can emulate the training with larger batch_size without distributed training. When the bathc_size is set to, let's say 64, and iter_size set to ITER_SIZE, then the effective batch_size will be 64*ITER_SIZE since all the gradients in ITER_SIZE batches are accumulated.
Is this doable in current code? Is there any plan for supporting this feature?
Thank you.
That is a cool trick. I am not aware of a plan right now but as we start working with converging FP16 this might be something that is done to test larger batch sizes. I will leave this open so everyone on the perf team can see it. If we do something, I will try to update the ticket and if not close it in 30-60 days.
This can be done by writing a new tf.train.Optimizer.
I have one here which at least works for common cases. (and you can probably copy-paste and use)
@ppwwyyxx TensorPack has everything. :-) I like looking through your code and examples.
@tfboyd thanks for your reply. If supporting the case is easy with what @ppwwyyxx has now, I would hope this feature be supported soon. I'm also looking at the Optimize provided above and not sure I can manage this. I'm quite a newbie in TF and python, and the code in this repo is an excellent example for me to dig in. Thanks again!