POT icon indicating copy to clipboard operation
POT copied to clipboard

Removing linear search as a step in GW descent

Open patrick-nicodemus opened this issue 11 months ago • 2 comments

TL; DR - I would like the GW function to provide an option to skip the line search in favor of the naive update step, because in my experiments line search is unambiguously worse.

In the currently implemented gradient descent algorithm for GW, the current implementation is something like the following. At time $t$, we have an existing coupling matrix, $C_t$. We apply optimal transport to solve for $C$ minimizing the expression $\langle AC_tB,C\rangle$ subject to the marginal constraints.

We write $\Delta C = C - C_t$; this is the gradient vector of steepest descent. One can show that the squared GW loss associated to the matrix $C_t + \alpha \Delta C$ is a quadratic function of the real parameter $\alpha$. Solving for the optimal value of $\alpha$ using the ordinary quadratic formula, we then coerce it to lie in the interval $[0,1]$. If $\alpha=0$ the algorithm terminates.

This cleverness with the line search goes beyond the theory developed in Peyre et. al., Gromov-Wasserstein Averaging of Kernel and Distance Matrices, ICML 2016. I am concerned that that this is somewhat premature optimization which is not worth it. In order to do this line search, one must solve for the coefficients of the quadratic formula, which involves four matrix multiplications by my count. Compare this to directly computing $\langle ACB,C\rangle$ which only involves two matrix multiplications. Therefore, there is a tradeoff involved; two more matrix multiplications for a possibly better next step.

I have experimented with the behavior of this algorithm on a real world data set, computing a number of Gromov-Wasserstein distances between point clouds equipped with the uniform distribution. In this experiment, the line search never found a solution $\alpha$ which fell strictly within the open interval $(0,1)$, so there was no tradeoff at all, it was just uniformly slower. I have rewritten the algorithm to remove the line search and it is approximately 23% faster and gives identical results.

I suggest the devs do their own experiments and see if there is a dataset in which the line search performs better; I would be interested in seeing such a dataset, perhaps one not using the uniform distribution on points. I request that there be an option for the user to disable the line search in favor of the more naive update step.

patrick-nicodemus avatar Jul 18 '23 18:07 patrick-nicodemus