AdaBound-Tensorflow
AdaBound-Tensorflow copied to clipboard
not work for sparse embedding model
thanks for your great job. I found it didn't work for large scale model with huge sparse embedding parameters. for the apply_sparse function may be not very efficient as AdamOptimizer in tensorflow. So do you have a plan to re-implement the sparse version of adabound?
I compared the difference between this version and AdamOptimizer in TF.
I think the key difference of efficiency is:
AdamOptimizer of TF call the function training_ops.sparse_apply_adam()
, this function accelerate the updating of sparse parameters.