AdaBound-Tensorflow
AdaBound-Tensorflow copied to clipboard
Simple Tensorflow implementation of "Adaptive Gradient Methods with Dynamic Bound of Learning Rate" (ICLR 2019)
I have seen a performance boost switching from Adam to AdaBound. I have tuned my model and found that a range of 2e-4 to 2e-2 works best. I am interested...
beta1_power = math_ops.cast(self._get_non_slot_variable("beta1_power", graph=graph), var.dtype.base_dtype) No configuration changes on my end, throwing an error that beta power1 is None.
Hi, your works help me a lot and I have a question on your code! I would like to know how to get the Learning rate in your code. I'm...
thanks for your great job. I found it didn't work for large scale model with huge sparse embedding parameters. for the apply_sparse function may be not very efficient as AdamOptimizer...
Version: Tensorflow 1.12 Stable. When I run the code, it appears the error: > x and y must have the same dtype, got tf.float32 != tf.resource in "AdaBound.py", line 132,...