conditional-flow-matching icon indicating copy to clipboard operation
conditional-flow-matching copied to clipboard

Using OT-CFM for gaussian distributions matching of different spaces

Open ttsesm opened this issue 7 months ago • 2 comments

Hi everyone,

Thanks for your contribution in this package! I am quite new in the Flow Matching field, and for my research I believe that I could utilize OT-CFM to perform the matching between two sets of gaussian distributions of different space, where in practice I would like to condition the flow matching velocity, possibly through a supervised way, to map one distribution to the other. However, I wonder how the OT mapping should work for such conditional distributions.

Checking the related literature and the nice description in the following article it seems quite promising that OT-CFM could be helpful for my task.

I will try to give you a brief description of my problem and then possibly we could possibly give me some feedback about it. Imagine that you have the following test case where I have a set of 3D gaussian distributions (also defined as ellipsoids) by their mean value μ (being a 1x3 vector) and their corresponding covariance matrices $Σ_1$ (being a 3x3 matrix). Then I have a set of 2D gaussian distributions (also defined as ellipses) again by their mean value ν (being a 1x2 vector) and their corresponding covariance matrices $Σ_2$ (being a 2x2 matrix). In theory the ellipses are the projected ellipsoids in a plane through a "linear" map $P$, $ℝ^3$ → $ℝ^2$ . In practice the mapping is done through a perspective projection.

Image

Having this setup, where my input are the 2D and 3D gaussian distributions I would like to recover the projection matrix $P$ and/or the optimal correspondences/coupling between my two sets of distributions.

Initially I started playing with optimal transport and the Gromov-Wasserstein distance in the Bures-Wasserstein space in order to try to find out the possible couplings/transport plan without success though, since apparently the distortion created from the perspective projections is not easy to be handled. I believe the fundamental challenge lies in the nature of GW distance: rather than directly comparing individual 3D and 2D points, GW instead seeks to align the internal distance structures of the two distributions. ​ However, this assumption does not hold well in our case, as projection inherently distorts internal distances. Specifically, in a 3D-to-2D projection, two points that are far apart in 3D space may become overlapping or closer together in the 2D projection. This distortion disrupts the internal distance relationships, making GW-based matching unreliable. For example, a ring-like structure formed by 3D Gaussian ellipsoids may appear as a stretched elliptical shape when projected from a specific angle. Moreover, I guess that without supervision there exists an infinite number of probability paths (equivalently an infinite number of velocity fields ) that transform one distribution to the other. Thus, I guess in order to get supervision for all time steps $t\in[0,1]$ (and not only at time $t=1$), one must fully specify a probability path/velocity field. The good aspect is that I do have the ground truth correspondences during training, so in my mind this should good enough to supervise the flow matching interpolation. However, a big question comes how to make usage of the FM model during the inference time since as I understand it the flow should be depending to the end result, meaning how from my query/test 2D gaussians I could go back to the corresponding 3D gaussians?

As I have modeled my solution in my mind at the moment, I am considering my different multiple perspective 2D gaussians as my target distribution and the 3D gaussians as the initial noise to start from.

Image

However, since my 2D gaussians are in a different space I guess I have was thinking about two solutions, a solution where I am padding them with zeros in order to bring them in the same space as my 3D gaussians. However, I am not quite sure that this is the correct approach to tackle the space distribution difference. Another, solution could be to create the corresponding cost matrices by using the Bures-Wasserstein (or the Fréchet) distance between the gaussian distributions of each set and then use these matrices as my input to OT-CFM?

My guess is that integrating the learned velocity field from noise to target should work fine, but the problem I see is that it doesn't inherently "know" about any specific projection relationship between the two distributions unless you've conditioned the model on that information during training. Something that I would like to do by using the OT-CFM. If I train the model without conditioning on the view-specific information or the correspondences, then as I understand it, the model learns an average flow that maps the noise distribution (3DGS gaussians) to the aggregate target distribution (the gaussians from all the multiple projections) seen during training. In that case, at inference time, the generated samples are drawn from that overall learned target distribution, and you have no mechanism to specify a particular projection relationship between the two distributions.

Ideally, I would like to combine the conditional flow matching of OT-CFM from this repo with the following Bures-Wasserstein Flow Matching pipeline which in practice allows me to work directly with gaussian distributions.

Thus, I would be interested to hear your opinion whether I could combine OT-CFM w/ or w/o the Wasserstein (or Burres-Wasserstein) Flow Matching work to create a model that will learn how to apply this mapping/projection through a flow matching interpolation.

In any case, I would appreciate any feedback.

Thank you for your time and apologies for the long text.

ttsesm avatar Mar 11 '25 12:03 ttsesm