Trying to accelerate LBFGS

Open muammar opened this issue 7 years ago • 1 comments

Thanks for this implementation. Recently, I've been working on a package to use machine learning for chemistry problems where I use pytorch to train some models. I have been able to perform distributed training using a library called dask that accelerated the training phase.

When I use first-order optimization algorithms such as Adam, I can get up to 3 optimization steps per second (but those algorithms converge slowly compared to second-order ones). When using LBFGS I just get 1 optimization step each 7 seconds for the same number of parameters. I am interested in using a dask client to make some parts of your LBFGS implementation to work in a distributed manner so that each optimization step is faster. I started reading the code, and have a very brief idea of the LBFGS algorithm. However, I wondered if you could give me some hints about the parts of the module that could be independently computed and therefore distributed?

I would appreciate your thoughts on this.

Apr 06 '19 18:04 muammar

Thanks for your question! I'm not too familiar with dask and am not sure about your problem setting. Can you clarify what problem you are looking at? Is it a finite-sum problem?

Typically, SGD/Adam are distributed in a data-parallel fashion, where only the function/gradient over a subset of the dataset is evaluated over each node, then aggregated for computation. Something similar can be done for L-BFGS, although there are various possible approaches for dealing with the two-loop recursion, line search, etc. However, this approach makes sense only if function/gradient evaluations are the primary bottleneck in computation (as it is in deep learning). If you have some additional details about your problem, I may be able to give some better ideas for distributing the algorithm.

Apr 09 '19 18:04 hjmshi