POT icon indicating copy to clipboard operation
POT copied to clipboard

Road to POT 1.0

Open kilianFatras opened this issue 4 years ago • 29 comments

Hello to all contributors,

The last POT 0.6 release brought new features to the library and we have now 25 papers implemented in POT. It was discussed that before making the 1.0 release, we should work on some fundamental changes inside the library. In my humble opinion, we should work on the most urgent changes before adding new features. If we keep adding new features, it will be even more complicated to make the fundamental changes afterwards. I start this issue in order to discuss these matters.

I copy past here what was discussed before. The list is non exhaustive and I invite you to complete it if you have ideas/wishes:


Reform changes

  • Naming convention (clearer and more consistent)

  • Duplicated code (bregman module)

  • Clean commented code

  • a two letters package name -- ot -- can cause multiple headaches ..

  • The emd functions should be in a specific module not in the init file

  • In some functions, the transport plan is computed (which can be heavy to store on gpus) even though it is not needed. I'm thinking there should be a function that explicitly computes the transport plan given the dual variables making the call specific by the user.

  • sinkhorn returns the distance or the plan depending on the second dimension of the input distribution b ..

  • make sure we have all the working infrastructure to make this (and future releases) by the CIs.

  • Domain adaptation name

  • Torch backend

I would state that the most urgent before adding features is the naming convention, because we can't add new functions with old names (ot.sinkhorn2 ...).


Name Shifting

It will be updated each time we converge toward a new name.

------------------documentation/examples------------------

  • n -> n_source_samples
  • xs/xt -> x_source/x_target
  • G0/Gs -> Gamma_emd/ Gamma_sinkhorn (May be ?)
  • reg parameter entropic -> epsilon (blur ?)
  • d (dimension parameter) -> n_features
  • N (barycenter example) -> N_distributions
  • X1 -> X_source (color transfer)

------------------variable names------------------

  • numItermax -> num_iter_max
  • numInnerItermax -> num_iter_max_{function name}
  • stopThr -> stop_threshold
  • (reg -> blur ?)
  • log (variable not bool) -> log_{namefunction}

Assignements

  • Variable names (Kilian)

kilianFatras avatar Dec 05 '19 09:12 kilianFatras

I fully support this initiative.

Also I would try to hunt variable names that are not explicit enough. m, n vs n_samples, n_bins, n_features

I think that ot as import name vs pypi package called pot is maybe confusing.

agramfort avatar Dec 05 '19 09:12 agramfort

Thanks for reviving this discussion @kilianFatras :)

About the name, to least disruptive change would be pot or pyot (unless the pun was intended originally :) )?

On Sinkhorn related topics, some pragmatic first steps:

  • I agree with @agramfort; I suggest n_bins since all sinkhorn funcs work on pre-specified grids.
  • Sinkhorn functions do not return the objective value but the approximation <Plan, cost> (without entropy); this should be made clearer or decided by the user
  • Merge sinkhorn and sinkhorn2 into one sinkhorn function with a return_ot_plan arg : if multiple measures are given either return a list of OT losses or a list of transport plans
  • Make the different solvers (log_stabilized, sinkhorn, greenkhorn .. ) private (just like the different solvers of scipy's minimize) and have each private solver return the dual variables always in log-domain, then compute either the loss or the plan.

I can work on bregman and unbalanced modules; the ones I'm most familiar with.

I would also like to point out that I'm using POT less and less recently because I have re implemented all my ot code in pytorch to be cpu / gpu agnostic and have the ability to autodiff easily. Having a torch backend is also worth discussing.

hichamjanati avatar Dec 05 '19 10:12 hichamjanati

Hi all,

Yes good idea to revive this discussion. Regarding the module import name, I would call for pot. 'import pot' seems more consistent to me with the library name.

Regarding the torch backend, it is really something we should discuss. We also start to have a lot of torch functions on our side that implements several versions of transport. Sometimes, they are also based on POT. I guess making POT torch compliant would be beneficial for a lot of persons in the community.

Nicolas

Le 5 déc. 2019 à 11:21, Hicham Janati [email protected] a écrit :

Thanks for reviving this discussion @kilianFatras https://github.com/kilianFatras :)

About the name, to least disruptive change would be pot or pyot (unless the pun was intended originally :) )?

On Sinkhorn related topics, some pragmatic first steps:

I agree with @agramfort https://github.com/agramfort; I suggest n_bins since all sinkhorn funcs work on pre-specified grids. Sinkhorn functions do not return the objective value but the approximation <Plan, cost> (without entropy); this should be made clearer or decided by the user Merge sinkhorn and sinkhorn2 into one sinkhorn function with a return_ot_plan arg : if multiple measures are given either return a list of OT losses or a list of transport plans Make the different solvers (log_stabilized, sinkhorn, greenkhorn .. ) private (just like the different solvers of scipy's minimize https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html) and have each private solver return the dual variables always in log-domain, then compute either the loss or the plan. I can work on bregman and unbalanced modules; the ones I'm most familiar with.

I would also like to point out that I'm using POT less and less recently because I have re implemented all my ot code in pytorch to be cpu / gpu agnostic and have the ability to autodiff easily. Having a torch backend is also worth discussing.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rflamary/POT/issues/111?email_source=notifications&email_token=AFJPODKZ54WNBHK46PUOGW3QXDIZ7A5CNFSM4JVWJLDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGAHDYY#issuecomment-562065891, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJPODOVXPPGYSWLH2AWFQ3QXDIZ7ANCNFSM4JVWJLDA.

ncourty avatar Dec 05 '19 10:12 ncourty

I have added torch backend to the list. I agree that it should be one of the next 'big' module to create (balanced & unbalanced solvers). I know @rflamary has started working on it. I would also add some ground cost functions such as TL^p cost.

I agree to merge sinkhorn and sinkhorn2. we can work on ot.bregman together @hichamjanati (will be easier). But we should be careful because ot.gromov, ot.optim depend a lot on sinkhorn... That's why making a first prototype on ot.unbalanced seems better to me. As ot.unbalanced has a lot of similarities with bregman but less dependance (just ot.da ?), we would have less problem modifying it.

I can check for ot.stochastic, this module will be easy to check as it depends on nothing and nothing depends on it.

kilianFatras avatar Dec 05 '19 12:12 kilianFatras

This is the dreaded but important discussion ;)

We indeed need to do a big variable cleanup and replace NumItermax that makes me cry a little every time with num_iter_max and other things. This is a gigantic pile of work but I see that a lot of people really want it so maybe we can spread the work ;).

About naming emd/emd2 is not clear indeed. I think we should go for explicit names such as ot_solve_plan ou ot_solve_value which would allow entropic_ot_solve_* for sinkhorn. I don't really like a parameter that changes the output (i know we do that with the log already).

I'm OK with renaming the module as pot (pyot i really don't like) as long as we keep the ot module in parallel for a while (I have ton of code that would break in a very sad way).

I also agree with Nicolas that a torch backend will really make POT more interesting. For instance we have a compatible torch emd function with implemented (sub)gradients wrt the weights and distance matrix. I'm for removing the gpu module that nobody really want to handle and propose a ot.torch with several losses and autodiff functions.

Note that we have to refuse any PR that do not provide continuity (as in : working current tests with old naming and proper deprecation warnings). This also mean updating the documentation, examples and the quick start guide.

rflamary avatar Dec 05 '19 12:12 rflamary

I think it is best to refuse most PRs while we have not finished the major changes. It might take several months, but once it is done, it will be easier to scale to other applications. At least, let's wait that we all converge toward the new naming convention and applied on a full module (I have a preference for pot.unbalanced).

Once we have the new naming convention, may be we can accept PR regarding pot.torch as it will probably be new functions that can adopt the new naming conventions.

In the meantime, other work can be finished such as the JMLR software paper.

kilianFatras avatar Dec 05 '19 13:12 kilianFatras

In the mean time, other work can be finished such as the JMLR software paper.

+100. Is there a draft somewhere already?

agramfort avatar Dec 05 '19 14:12 agramfort

Yes there is and we will contact everyone as soon as i finish paperworks (so before or after NeuRIPS). I was hesitating about creating a dedicated branch (with pr from contributors for institutes and contributions) or create a new repository where everyone has commit rights to avoid 20 minor PR.

WDYT ?

rflamary avatar Dec 05 '19 14:12 rflamary

I reedited my first post with the name shifting for both documentation and code.

kilianFatras avatar Dec 05 '19 16:12 kilianFatras

Very good initiative indeed. If no one is on it yet, I would be glad to work on variable renaming within the domain adaptation module: however, we might need to procede sequentially so that I use the same variable names as the other modules. In addition, some of the variable names here were design to somehow mirror scikitlearn's X and y naming convention. This is however not at all clear. In addition, the way the domain adaptation module is designed is in my humble opinion not very scikit-friendly after all (though this is a different discussion).

ngayraud avatar Dec 05 '19 20:12 ngayraud

Hello @ngayraud,

No one has worked on the domain adaptation module yet. If you want, you can work on it :). But I think it is best to wait to have a clear new naming convention. It will be easier to set it after NeuRIPS.

kilianFatras avatar Dec 09 '19 12:12 kilianFatras

Of course. I saw that some of us are here, and there is a meetup in whova. Looking forward to seeing you all!

ngayraud avatar Dec 09 '19 18:12 ngayraud

Hello all, now that ICML's deadline is passed, shall we re-open this discussion ?

hichamjanati avatar Mar 02 '20 12:03 hichamjanati

Hello all, I agree :) I have completed the PR on the stochastic module.

kilianFatras avatar Mar 02 '20 12:03 kilianFatras

Is there a separate branch where these all of these changes are being made? I am not currently a contributor, but would like to help, if possible.

I think the wider community will benefit from the following:

Different backends

I may be jumping the gun here, but have we considered different backends? I know torch will be quite useful and should be of higher priority (as mentioned above), but it might be worth looking into supporting other backends, if there are users in the community who prefer a certain backend over another. One example is JAX, which has the added benefit of supporting (most) useful methods from numpy though jax.numpy. It also supports an arbitrary amount of differentiation as long as the function allows for it. NOTE: This is not of high priority.

JIT Compilation

It seems like a lot of ops in POT can benefit from just-in-time (JIT) compilation. Since a torch backend is already in the roadmap, it might be worth looking into Pytorch's JIT support, as an alternative to numba's JIT or JAX's JIT.

Python 2.x

Remove support for 2.7 since support for python 2.7 ended officially on January 1st, 2020. Does anyone still use 2.7?

Typehints

Regardless of whether we choose to remove python 2.7 support, I think it makes sense to use typehints in all methods. If python 2.7 support is kept, type hints can be generated via stubs.

CI/CD

More complete CI/CD pipeline. In particular, checks for:

  • Different OS's (Linux, MacOS, Windows?)
  • CPU/GPU testing (GPU testing might not be possible as there are no free GPU CI testing AFAIK)
  • Code coverage (i.e. Codecov)
  • PEP8 / pylint
  • Python versions (3.5, 3.6, 3.7, 3.8)
  • etc.

Note that this is not an exhaustive list. Feel free to suggest things that should be included/removed on the CI/CD pipeline.

ayushkarnawat avatar Apr 06 '20 16:04 ayushkarnawat

Hi all,

I was reading through this, and I'm surprised none of you mentioned using tensorflow as a backend at all. May I ask why it is not being considered? A few tools on the tensorflow side would definitely benefit greatly from an off-the-shelf differentiable OT toolbox.

To the best of my knowledge monte carlo methods are more mature on the tensorflow side, e.g. pymc will be using a tensorflow 2 backend.

I am not sure however how pytorch vs tensorflow compare on libraries interested in things such as GP (GPFlow vs GPyTorch), point cloud methods (tensorflow graphics vs pytorch geometry), or bayesian optimisation (GPFlowOpt vs BOTorch).

Am I missing the point?

AdrienCorenflos avatar Apr 23 '20 10:04 AdrienCorenflos

@AdrienCorenflos Generally, from my experience, torch is a lot more intuitive to use vs. tensorflow. It feels more pythonic. This same view is held by many users, as evidence here and here.

Regarding MC methods, and more generally, probabilistic programming methods, pyro, which uses torch as the backend, is quite good. Similar frameworks pymc4 ~~and edward2,~~ which use tensorflow, are more early stage and not as mature as pyro in terms of methods/features. If one does want to go with the tensorflow approach, then tensorflow-probabilty is likely the best current option. Edit: It was pointed out that TFP is (currently) more feature complete than pyro, which after a little more research, seems to be the case.

Additionally, the pyro-ppl team is also developing a JAX-based probabilistic programming library in numpyro, which is not as feature complete (still in beta).

ayushkarnawat avatar Apr 25 '20 23:04 ayushkarnawat

my big personal concern currently is code duplication between gpu and non-gpu code. The GPU code is not maintained or tested as well as the CPU code. See how tensorly fixed this pb:

https://github.com/tensorly/tensorly/tree/master/tensorly/backend

agramfort avatar Apr 26 '20 09:04 agramfort

@AdrienCorenflos Generally, from my experience, torch is a lot more intuitive to use vs. tensorflow. It feels more pythonic. This same view is held by many users, as evidence [here]

I would agree that it used to be in the (not so distant) past. I really don't think this is the case anymore, mainly thanks to the autograph capabilities. Also note that pytorch doesn't do graph optimisation (such as pruning) which makes it much harder to write complex and efficient code in pytorch compared to tensorflow. For example how do you write a parallel for loop in pytorch? On the other hand I must admit that simple operations are sometimes faster in pytorch - like sampling - so not sure how that evens out.

edward2, which use tensorflow, are more early stage and not as mature as pyro in terms of methods/features. If one does want to go with the tensorflow approach, then tensorflow-probabilty is likely the best current option.

I kinda don't make a difference between edward2 and tensorflow_probability, to me they just walk hand in hand, sorry about the confusion. I would however disagree (at least ask for proof) regarding tensorflow_probability not being as mature as pyro, I personally have never seen a functionality in pyro that wasn't in tfp, the converse is not true.

AdrienCorenflos avatar Apr 26 '20 09:04 AdrienCorenflos

my big personal concern currently is code duplication between gpu and non-gpu code. The GPU code is not maintained or tested as well as the CPU code. See how tensorly fixed this pb: https://github.com/tensorly/tensorly/tree/master/tensorly/backend

From a surface look it feels like their stuff is only ever going to work for backends implementing eager execution am I wrong?

Also how would you implement custom gradients for different backends? For example the semi-dual approach by Blondel and al. needs it.

AdrienCorenflos avatar Apr 26 '20 09:04 AdrienCorenflos

I don't think we have the bandwidth to support all possible backends. these days I would use torch or jax. I have more code in torch but cleary the cupy code we use now it's not the future

agramfort avatar Apr 26 '20 11:04 agramfort

I kinda don't make a difference between edward2 and tensorflow_probability, to me they just walk hand in hand, sorry about the confusion. I would however disagree (at least ask for proof) regarding tensorflow_probability not being as mature as pyro, I personally have never seen a functionality in pyro that wasn't in tfp, the converse is not true.

My mistake, you are right. Edward2 and TFP essentially work hand in hand, since Edward is a high-level interface for TFP, and thus, makes it easier to define models. Also, I should have been more clear in my above comment that TFP is actually quite mature (sorry for the confusion). I have updated my comment above to reflect this.

I should also mention that I have NOT used pyro or TFP that extensively (only for a bayesian modeling class), so I am not aware of all the features. Keeping that in mind, I would argue that most of the features we may need are already in pyro, or can be implemented relatively easily. I may be wrong here, so feel free to correct me. It might help to provide an example (if you can) where we might need some functionality from TFP that cannot be done (as easily) in pyro.

Also note that pytorch doesn't do graph optimisation (such as pruning) which makes it much harder to write complex and efficient code in pytorch compared to tensorflow.

I have never really needed to do graph optimization ever (probably because I mainly do R&D and don't launch stuff into prod that much :( ), so cannot really speak to if it helps improve throughput that much. However, I think torch's JIT compilation can help bridge the gap in efficiency, without adding too much complexity.

For example how do you write a parallel for loop in pytorch?

Couldn't this be done via torch.multiprocessing, or am I missing something? For example,

# Obviously one would never do this, as it is more efficient to perform
# vectoried ops, but this is just an example.
import torch
import torch.multiprocessing as mp

def dummy_func(x: torch.Tensor) -> torch.Tensor:
    return x * x

with mp.Pool(mp.cpu_count()) as p:
    out = p.map(dummy_func, torch.Tensor([1, 2, 3]))
    print(out) # [tensor(1.), tensor(4.), tensor(9.)]

I don't think we have the bandwidth to support all possible backends.

Completely agree, its easier to support one backend that supports GPU-acceleration initially and then expand to other backends IF the community needs it.

ayushkarnawat avatar Apr 26 '20 21:04 ayushkarnawat

The main thing as @agramfort said is that we do not have the man power to handle all deep learning toolbox for gpu.

cupy was not such a good idea in hindsight (and we used an even less common gpu implementation before that). jax is impressive but it is as early now as cupy was when we switched. Creating a backend system as in tensorly would be awesome but it is a lot of work (and would require a full rewrite of POT).

Like @agramfort I have a lot of working code in pytorch and I think we can have a lot of stuff in this format. I personally did not use tf 2 so any proposal on this sens will need to have a maintainer that is ready to work on it long time. I'm ready to replace ot.gpu by ot.torch that do not require that do not require the full POT api but mostly emd and sinkhorn solvers and losses for use in torch models.

rflamary avatar Apr 27 '20 05:04 rflamary

I personally have the following in tf2 which I can contribute without much effort (apart from interface style maybe)

  • geomloss (a personal translation of feydy's code)
  • standard sinkhorn
  • sliced wasserstein (includes emd 1d)
  • Blondel's semi-dual

Most of my code is made to take log inputs though (for numerical stability), happy to share.

On Mon, 27 Apr 2020, 06:28 Rémi Flamary, [email protected] wrote:

The main thing as @agramfort https://github.com/agramfort said is that we do not have the man power to handle all deep learning toolbox for gpu.

cupy was not such a good idea in hindsight (and we used an even less common gpu implementation before that). jax is impressive but it is as early now as cupy was when we switched. Creating a backend system as in tensorly would be awesome but it is a lot of work (and would require a full rewrite of POT).

Like @agramfort https://github.com/agramfort I have a lot of working code in pytorch and I think we can have a lot of stuff in this format. I personally did not use tf 2 so any proposal on this sens will need to have a maintainer that is ready to work on it long time. I'm ready to replace ot.gpu by ot.torch that do not require that do not require the full POT api but mostly emd and sinkhorn solvers and losses for use in torch models.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PythonOT/POT/issues/111#issuecomment-619730223, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYGFZ6Y3TVC6XK456OWZ3DROUJY5ANCNFSM4JVWJLDA .

AdrienCorenflos avatar Apr 27 '20 08:04 AdrienCorenflos

Note: we are only having this discussion because it becomes hard to support multiple backends, as @agramfort said. Otherwise, both TF and Torch would be good backends to support.

I agree with @rflamary and @agramfort that we should choose a backend between torch or JAX. I haven't worked with TF (or TF2) in a while as well. But this, obviously, is NOT a good reason to not consider using TF2 as our backend. Below, I will try to motivate why we should consider using torch over TF.

Torch vs TF

Like with any design decision, there are tradeoffs for using one backend vs another. For example, one might consider speed (performance) to be of importance, others put a greater emphasis on simplicity, while others prefer reaching a larger user base. In the optimal scenario, all three should be addressed to allow for ease of implementation and performant code.

Performance

If the argument is that TF2 is more performant due to graph optimization, I cannot seem to find any extensive benchmark studies online comparing performance of torch vs tf2. However, some results here suggest that torch is more performant (atleast for certain class of models). In fact, it suggests that one should disable eager mode in TF2 for more better performance, at which point we lose the dynamic graph generation and ease of debugging. Another forum also suggests that torch is a little bit faster than TF2.

Honestly, I don't think there is much of difference between the two libraries anymore in terms of performance (if care is taken to optimize code during implementation). I say this because its hard to quantify (without having to write many different use cases in both tf and torch) easily.

Simplicity

With the addition of eager execution and other changes brought to TF2, it (now) seems that both frameworks should be similarly easy to use and implement functions. However, if not using eager mode, say for performance reasons, then TF is significantly harder to use.

Community Usage

I think the biggest reason with going with a torch backend is that a lot of the computational/data science research community who need GPU support use it (and currently prefer it over TF as mentioned previously). In particular, since current uses for OT are still heavily research-focused, the OT user base will likely prefer a torch backend.

One might argue that TF has more widespread usage overall. While this is true, it can be attributed to the fact that (a) it was the one of the first major ML libraries that had GPU support and (b) if one required a production-grade systems, the only real option was to use TF. As such, a lot of legacy TF codebase is still being used, especially in industry. In fact, if you subset the usage by research/industry, data shows that torch is used more in research-focused envs, while TF is used in production grade envs (see this Reddit discussion). Keep in mind that this may change depending on if researchers decide to switch back to using TF2.

But I don't think that will happen. Now that pytorch can be serialized and JIT complied using torch script, it allows torch models to be used in production-grade systems. In fact, "in enabling Eager mode by default, TensorFlow forces a choice onto their users - use eager execution for ease of use and require a rewrite for deployment, or don’t use eager execution at all. While this is the same situation that Torch is in, the opt-in nature of Torch’s TorchScript is likely to be more palatable than TensorFlow’s 'Eager by default'." [1]

This means that there would be no good reason to switch from using Torch to TF2. As such, since researchers prefer Torch over TF currently, I think it will continue being used well into the future.

JAX

I actually like JAX capabilities, and in fact for our purposes, it seems to be more than capable (since we will be dealing with mostly functional transformations and not creating DNNs). In fact, with the ease of JIT compilation, easy parallelization capabilities (vmap, etc.), it seems easier (aka less code) to obtain optimal performance when compared to torch/tf. For example, if programming a second-order method which uses the hessian H(x) to scale a matrix [2,3] rather than one based on a RAS method (i.e. sinkhorn), it is a lot easier to do it in JAX versus Torch/TF.

# JAX
from jax import jit, jacfwd, jacrev
def hessian(fun):
  return jit(jacfwd(jacrev(fun)))

versus the torch implementation, which is a pain to optimize for performance due to the for-loop used in the jacobian() which is hard to parallelize. Similar problems arise in the TF implementation, which would require using the experimental parallel_for function [4].

Perhaps, the biggest issue preventing us from using JAX is that (a) it is still in alpha (early stage) and as such, (b) it is not nearly as widely used currently. I hope this will change in the future, but for now it may not make much sense to use it as a backend for this library, given that torch is (obviously) the more popular framework.

Summary

So to summarize, I think we should use torch over tf (even though I ultimately prefer JAX).

References

  1. https://thegradient.pub/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry/
  2. https://arxiv.org/abs/1704.02315
  3. https://arxiv.org/abs/1704.02310
  4. https://news.ycombinator.com/item?id=18636054

ayushkarnawat avatar Apr 30 '20 21:04 ayushkarnawat

Bumping to check whether we have decided on a backend that is suitable for the framework.

Additionally, I think we could possibly improve the API design to make it more easily extendable. For example, I started a basic design, which I think allows users of the framework to define their own cost function, regularizers, or even a new solver/algo. I am still unsure how domain adaptation and domain reduction fit into the core package, so I think they will just be application-based somewhat and require importing from the core module. I'm imaging the tree looking something like the following:

ot/
├── core
│   ├── costs
│   │   ├── euclidean.py
│   │   ├── quadratic.py
│   │   └── square.py
│   ├── optimizers
│   ├── regularizers
│   │   ├── entropy.py
│   │   ├── kl.py
│   │   └── l1l2.py
│   └── solvers
│       ├── bregman.py
│       ├── gromov.py
│       ├── partial.py
│       ├── smooth.py
│       ├── sparse.py
│       └── stochastic.py
├── da
├── dr
└── utils
    ├── deprecation.py
    ├── profiler.py
    └── visualize.py

With this design, the ot package will only import the core and utils modules when import ot is called. For the domain adaptation (da) and domain reduction (dr) modules, users can load them separately by using import ot.da and import ot.dr, respectively.

Note: I am continuing this discussion from the gitter channel, where @rflamary had some valid concerns about using this approach, namely that it would be get to complex to check for cases where the some algos can be used for a certain type of problem and where others cannot. I'm just putting it here to (a) get some feedback, (b) see if there are potentially better way to address this problem, and (c) for improvements in design :).

ayushkarnawat avatar May 16 '20 21:05 ayushkarnawat

I agree that since we break the API for POT 1.0 we should clean up a lot of things and have a better organization of the code. I'm not against a core submodule with all solvers in it.

But as is said on gitter I have a concern about the hard relation between, some solvers and their regularization (most of the solver can handle only one or two regularizers and cannot be extended). So I'm not sure one can separate regularization/solver/optimizer as cleanly as that. OT is a very structured optimization problem and good solvers do take into account the structure of the regularization (as opposed to deep learning where loss/models/optimization algorithm are interchangeable). We have a generic solver that works for any regularization but it is quite slow in practice and basically should not be used for any of the regularizers in your regularizer list.

I also think the costs should not be a part of of POT more than it already is (one function for easy access) and definitely not a module with different python files. OT depends on the cost and it is for the user to decide which to use/implement. We have some specific solvers for squared euclidean (Bures,1d) but other than that all solver accept a pre-computed cost matrix (going with pykeops is another question) and I don't think we should handle this part of the computation because then a lot of option should be handled/debuged/tested (is the cost matrix normalized as often done in sinkhorn? by its mean? its median?). This is not IMHO the job of the toolbox that focuses on providing generic (to the cost) solvers for different OT problems.

ot.dr is not imported by import ot by default due to its dependency on autograd but ot.da is because it only depends on core package and I don't think it makes POT slow to import.

rflamary avatar May 18 '20 07:05 rflamary

P.S., also consider differences in reproducibility for both, and for TF's Graph vs. Eager - which I found to be extreme for latter. The larger the model, the worse the reproducibility - quick code here. Per the link, large model performance isn't doing great either - and inference can be irregular.

OverLordGoldDragon avatar May 19 '20 04:05 OverLordGoldDragon

@rflamary I have been thinking about this some more, and I there is a way to fix some of the concerns. Let me know what you think, and sorry for taking so much of your time.

OT is a very structured optimization problem and good solvers do take into account the structure of the regularization (as opposed to deep learning where loss/models/optimization algorithm are interchangeable).

While I agree that many solvers in OT are structured in the sense that only a select few regularizers can be used with certain solvers, I think we could still benefit from using a more object-oriented style approach. Within a certain solver, we could simply call the regularizer class that the algo uses. This has the following benefits: (a) it (somewhat) decouples the regularizers from the optimization algos, allowing (b) users to more easily add new solvers that use/do not use regularization (as mentioned previously); finally, it helps to (c) not repeat the snippets of regularizer code within different solvers every time, which is just better practice and reduces errors. This approach would also not require any "if" statement to check if a regularizer work with a specific solver. Hopefully, this makes sense. If not, please let me know.

We have a generic solver that works for any regularization but it is quite slow in practice and basically should not be used for any of the regularizers in your regularizer list.

This above approach would naturally allow us to keep the generic solver that works with any regularizer (for completeness), even though as you mentioned, it should never really be used.

I also think the costs should not be a part of of POT more than it already is (one function for easy access) and definitely not a module with different python files. OT depends on the cost and it is for the user to decide which to use/implement.

While I agree that solvers should receive the pre-computed cost matrix as a parameter rather than passing in the cost metric function, I still think it would be useful to have some well-known (and well-used) cost functions implemented. This allows the user to (more) quickly pre-compute the cost matrix (without needing to re-write simple cost functions every time) and pass it into any of the solvers to compare how the loss/OT plan would differ between two different cost metrics. Alternatively, if we want to offload the cost computation to another library, then I don't really see a problem with that approach either. Regardless, it would still be useful, as you mention, to have at least one generic cost function.

Edit: Actually, there is a case for not pre-computing the cost matrix between two distributions. If the number of samples/points between the source and the target is too high (>25K), then the cost matrix (25k x 25K entries) can becomes too big to fit into memory. This would require that the cost matrix be computed on the fly (likely through some kernel operator). I'm not sure how exactly this will work, but I'm thinking something like a LazyMatrix/Tensor that doesn't actually store values in memory.

(going with pykeops is another question)

Actually, it seems that for large point meshes (>100K points), pykeyops works really well to compute cost matrix using kernels. It would be worth exploring, I think.

I don't think we should handle this part of the computation because then a lot of option should be handled/debuged/tested (is the cost matrix normalized as often done in sinkhorn? by its mean? its median?).

The cost function can have an extra parameter that asks whether or to normalize the cost. Alternatively, if we go with the user defines the cost function approach (which is valid approach as mentioned above), then they can, in their computation, define how they will normalize the cost (Note: they might forget/choose not to normalize). Regardless, I think we would STILL have to somewhat test if the cost function is normalized (in some sense) since that would affect the final plan/loss, giving a "runtime warning" if something feels "off" (although I understand this will be hard to determine, since we don't know what the "correct" plan is for the given problem).

This is not IMHO the job of the toolbox that focuses on providing generic (to the cost) solvers for different OT problems.

While I agree this is not the main focus, it can affect the results, which we want to protect against. For most problems, the toolbox should function well without needing to understand OT theory that well (somewhat ironic I know), why the cost function needs to be normalized for certain solvers, etc. (atleast that's what I think). Maybe this could be a low priority item.

ot.dr is not imported by import ot by default due to its dependency on autograd but ot.da is because it only depends on core package and I don't think it makes POT slow to import.

This makes sense. I am fine with importing ot.da along with the ot.core and ot.utils package when first importing the toolbox, if we do decide to break up the API as proposed.

ayushkarnawat avatar May 24 '20 20:05 ayushkarnawat

Hello I'm closing this Issue since we have been working on the new API that adersses most of the comments. I will open a new issue with a list of things updated (as of 2024) of what we want for POT 1.0 release.

rflamary avatar Jan 15 '24 12:01 rflamary