copt (H)omotopy CGM

This is a first step towards implementing Homotopy CGMs into copt.

Based closely on the existing Frank-Wolfe implementation, this incorporates additional linear constraints into the picture via homotopy smoothing.

In this commit there are element-wise and row-wise constraints.

The next step is to provide randomized versions of this algorithm.

Apr 16 '21 08:04 gideonite

I should add some unit tests for this as well, but I thought that now might be a good time to get feedback. I can convert this to a github draft if you like.

Apr 16 '21 08:04 gideonite

Coverage increased (+0.03%) to 88.76% when pulling 583846a0e37f06331d1a08d2cf9f88659f595e52 on homotopy_pr into e5cb995065e26acd543ef13a330351eb1c381bdd on master.

Apr 16 '21 08:04 coveralls

Note to self: are there places where I could make the code faster? Specifically, but using numba / utils.njit? There aren't any tight loops where I am going over an array. The basic Frank-Wolfe implementation doesn't have numba stuff so maybe we can't either, but I can help thinking that there might be a way to accelerated the main for-loop of the algorithm.

Apr 16 '21 08:04 gideonite

Super, thanks you! I'll do these things and update.

Apr 16 '21 15:04 gideonite

Rather than this clumsy use_eigs option that was there before, I replaced it with a new lmo, TraceSpectrahedron, which optimizes over the psd matrices with bounded trace norm. This is actually what I originally wanted but I was a bit confused.

Other than that, I added the comments, changed the names, etc. as you requested. @fabianp

Still to do:

[unit tests] I don't need feedback on this (yet). I just need to do it.
[figure out how dataset downloading should work] I'm not sure how to proceed or whether you have an opinion. I can add the dataset to the google storage but I need access. Or, I download directly from the source. The task is clustering so I'm not sure how much of the _load_dataset function should be reused. @fabianp
Minor point. I was a bit optimistic and thought that I could make a clear single commit PR. I'm not sure if you care about the history, but let me know and I'll squash and open a new PR with a single commit. Or, perhaps there's a better way?

Apr 21 '21 20:04 gideonite

No worries about the history, it's fine if there are multiple commits

Apr 21 '21 22:04 fabianp

another option for the dataset is to use something more standard that already includes mnist like tf-datasets, https://www.tensorflow.org/datasets/overview

Apr 28 '21 15:04 fabianp

The only problem with that is that the dataset that we are using are the outputs of a softmax. We could train the simple model dynamically when the dataset is downloaded or download directly what was used in [MVW16]. For now, I decided to download it directly.

Ok, I think it's finally ready for merge! What do you think? @fabianp

[MVW16] D. G. Mixon, S. Villar, and R. Ward, “Clustering subgaussian mixtures by semidefinite programming,” May 2016. http://arxiv.org/abs/1602.06612, GitHub Repo: https://github.com/solevillar/kmeans_sdp

May 03 '21 16:05 gideonite

Fixed a small bug and added a unit test to test for it next time. Apparently, this unit test also improves the code coverage above the threshold for coveralls (finally).

May 04 '21 15:05 gideonite

thanks @gideonite l it all looks good to me.

@GeoffNN do you mind taking a look a this ? Thanks

May 04 '21 22:05 fabianp

I'll look at it ASAP!

May 05 '21 03:05 GeoffNN

Does anyone else have feedback? This has been in purgatory for a while and I'd like to get it merged. @GeoffNN @fabianp

Jun 22 '21 18:06 gideonite

@gideonite quick ping on this, if you want to wrap it up!

Nov 08 '21 18:11 GeoffNN

Hi @GeoffNN, sorry for the delay. Been caught up with other things and this has fallen by the wayside. I will need some more time to get back to this but if there's still interest in getting this merged, I can probably easily address most of these issues.

Nov 14 '21 14:11 gideonite

The remaining issues should be straightforward / it would also be fine to leave TODO items for a subsequent push :)

Nov 19 '21 02:11 GeoffNN