Computing Wasserstein barycentres using bridge-matching

This is an accompanying blogpost for our NeurIPS 2025 paper Schrödinger Bridge Matching for Tree-Structured Costs and Entropic Wasserstein Barycentres, in which we use flow-matching techniques to compute Wasserstein barycentres. Checkout the full paper here! Code is available on GitHub.

Overview

The Wasserstein barycentre provides a natural notion of ‘average’ for probability distributions by leveraging optimal transport theory, and has a wide range of applications (see Chapter 9.2 of this book for an overview). However, computing barycentres is notoriously difficult. Optimal transport, and its dynamic entropy-regularised counterpart the Schrödinger bridge, have recently become massively popular in the generative modelling community due to their connections to flow-matching-style algorithms. In this work, we propose an efficient iterative method for computing Wasserstein barycentres using flow-based generative modelling techniques, inspired by connections to the highly successful fixed-point methods for barycentre computation.

What is optimal transport?

Let’s first start by defining the basic optimal transport problem. Given a source distribution $\mu_0$, a target distribution $\mu_1$, and a function $c(x,y)$ representing the ‘cost’ of transporting from $x$ to $y$, the optimal transport problem aims to find the most cost-efficient way to map from the source to the target. The original OT formulation given by Monge in 1781 formalises this by aiming to find the map $T^*: \mathbb{R}^d \rightarrow \mathbb{R}^d$ pushing $\mu_0$ onto $\mu_1$, denoted $\mu_1 = T^* \text{#} \mu_0$, that solves the minimisation, $$ \min_{T: T \text{#} \mu_0 = \mu_1} \int_{\mathbb{R}^d} c(x, T(x)) \mathrm{d}\mu_0(x). $$ While intuitive, the Monge formulation of OT is restrictive in that it does not permit splitting of mass in the transportation. Instead, it is common to consider the more general Kantorovich formulation, which searches over joint distributions $\pi \in \mathcal{P}(\mathbb{R}^d \times \mathbb{R}^d)$ with the correct marginals, $$ \min_{\pi: \pi_0 = \mu_0,, \pi_1 = \mu_1} \iint_{\mathbb{R}^d \times \mathbb{R}^d} c(x, y) , \mathrm{d}\pi(x, y). $$

When the cost function is quadratic, that is $c(x,y) = \frac{1}{2} \lVert x-y \rVert_2^2$, this minimum gives the squared Wasserstein-2 distance $W_2^2(\mu_0,\mu_1)$ between the two measures. This case is particularly important as $W_2$ gives a natural notion of distance between probability distributions, and can be considered the analogue of the $L_2$ distance in $\mathbb{R}^d$ but for the space of probability measures.

Defining an ‘average’ of probability distributions

Now lets consider how one might define a notion of ‘average’ for a set of probability measures $\mu_1, …, \mu_\ell$. In the familiar case when we want to take an average of vectors $x_1, …, x_\ell$ this is of course very easy, as the mean $\bar{x} = \frac{1}{N} \sum_i x_i$ can be computed in closed form. Note that this can in fact be formulated as a variational problem, where we search for the vector that minimises the cumulative squared $L_2$ distance to each of the points $$ \bar{x} = \min_{x} \sum_i \lVert x - x_i \rVert_2^2. $$ Recall that earlier we said that the Wasserstein distance $W_2$ provides the analogue of the $L_2$ distance on the space of probability measures. This lets us define a similar formulation for the ‘mean’ for probability distributions – we simply substitute in the Wasserstein distance $W_2$ in place of the $L_2$ norm! This is precisely the definition of the Wasserstein-2 barycentre1: given $\ell$ measures $(\mu_1, \ldots, \mu_\ell)$ and weights $(\lambda_1, \ldots, \lambda_\ell)$ summing to $1$, we aim to find $$ \nu^* = \arg\min_{\nu} \sum_i \lambda_i W_2^2(\mu_i, \nu). $$

Just by looking at this problem formulation, we can immediately see that computing the barycentre is going to be difficult. It involves $\ell$ OT subproblems (computing even a single OT problem can be challenging!), and moreover one of the marginals is free to move and needs to be optimised over.

Fortunately, there is an elegant and highly successful way to solve the for the barycentre that leverages the fixed-point property $x = \sum_i \lambda_i T_i(x)$ which holds at the solution, where each $T_i$ is the OT map from $\nu$ to $\mu_i$. Intuitively, this property states that ’each point in the support of the barycentre is the average of the corresponding points in the marginals’. The resulting fixed-point procedure is known to converge very quickly and has formed the basis of many successful algorithms234. However, it still requires computing $\ell$ OT maps at each iteration, which can be expensive.

Fixed-point barycentre iteration
Fixed-point barycentre solver: Each barycentre point equals the weighted average of its corresponding points in the known marginals, \(x=\sum_i \lambda_i T_i(x)\), motivating a fixed-point procedure for barycentre computation.

Schrödinger bridges and Iterative Markov Fitting

Now lets discuss the Schrödinger bridge problem, its connections to OT, and how flow-based generative modelling techniques can be used to compute the solution. Given an initial and final measure $\mu_0$ and $\mu_T$ of a population, the SB problem aims to identify the most likely intermediate dynamics of the population under the assumption that the movement is driven by a stochastic reference process $\mathbb{Q}$. The resulting dynamic SB problem is an optimisation over path measures $\mathbb{P}$, and is defined as $$ \operatorname*{argmin}_{\mathbb{P}~ \mathrm{s.t.}~ \mathbb{P}_0=\mu_0, \mathbb{P}_T=\mu_T} \mathrm{KL}( \mathbb{P} \Vert \mathbb{Q}). $$ Usually we take the reference process $\mathbb{Q}$ to be that of a Brownian motion $\sigma B_t$, in which case we are asking the question: if a set of Brownian motion particles started distributed as $\mu_0$ at time $0$, and ended distributed as $\mu_T$ at time $T$, what happened in between? For this choice of reference process, the KL divergence can be expanded and the SB problem recovers an entropy-regularised version of the quadratic OT problem that we described earlier. Intuitively, this make sense–if the Brownian motion particles started at $\mu_0$ and ended at $\mu_T$, then we would expect the particles to have travelled somewhat optimally between the marginals in the intervening time (particularly as the amount of noise $\sigma$ is gets smaller).

The SB solution $\mathbb{P}^{SB}$ can be characterised as the unique path measure $\mathbb{P}$ with the correct marginals that is both Markov (i.e. can be simulated forward in time, as an SDE), and a mixture of Brownian bridges. This motivates the Iterative Markovian Fitting (IMF) procedure56, which repeatedly projects onto these two characterising properties. The reciprocal projection is a mixture of Brownian bridges mixed according to the current endpoint coupling, and the Markovian projection is the Markov process that traces out the same path of marginals, and is computed using bridge-matching (which is just flow-matching, but with noise). The resulting procedure iteratively ‘rewires’ the paths until it converges to the SB solution, by repeating the procedure in the diagram below.

Reciprocal Markovianised

Iterative Markov Fitting: IMF alternates between two steps: (Left) Reciprocal projection—mixture of Brownian bridges between sampled endpoints, obtained by simulating the current Markov process; (Right) Markov projection—fit an SDE using bridge-matching, obtaining a Markov process tracing out the same marginals.

Using IMF to compute barycentres

Based on what we have described so far, we are presented with two natural questions:

  1. Can IMF be used to solve for Wasserstein barycentres?
  2. Can flow-based OT solvers be used in a fixed-point barycentre algorithm?

In fact, these questions are answered by the same procedure, which performs IMF over a tree-based structure!

We can think of Wasserstein barycentres as a multi-marginal optimal transport problem with a tree-structured cost function $c(x_{0:\ell}) = \sum_{i=1}^\ell \lambda_i \lVert x_0 - x_i \rVert_2^2$. Specifically, the cost function is defined over a star-shaped tree, in which the centre node (which is free to move) corresponds to the barycentre and each of the leaves is a fixed marginal $\mu_i$.

Just as in the standard case, there is an analogous Schrödinger bridge problem, in which we now consider stochastic processes defined over the tree structure. We can think of stochastic processes over the tree structure as ‘joined up’ versions of the standard problem along each edge, now joined up at where the edges share vertices. The TreeSB78 problem considers minimising over path measures $\mathbb{P}$ defined over the whole tree structure, $$ \operatorname*{argmin}_{\mathbb{P} ~\mathrm{ over~tree,~s.t.~} \Pi_i=\mu_i ~~ \forall i \in \mathcal{S}} \mathrm{KL}( \mathbb{P} \Vert \mathbb{Q}). $$ When the reference process $\mathbb{Q}$ is Brownian, this again recovers an entropy-regularised version of the corrresponding OT problem. Therefore, if we can solve for the TreeSB solution, we will have an approximation to the Wasserstein barycentre! Indeed, this construction was used in 8 using the Iterative Proportional Fitting technique to solve for the solution.

So what is the corresponding IMF procedure for this tree-structured case? Previously, IMF simulated the current process to obtain coupling samples $(Y_0, Y_T)$, and then sampled from Brownian bridges $\mathbb{Q}_{\cdot| 0,T}$ to perform the bridge-matching to obtain the next iterate. Now, rather than having points sampled from the coupling over the endpoints, we have samples ${Y_i, i\in \mathcal{S}}$ from a coupling over all the known marginals $\mathcal{S}$. However, computing the corresponding bridges is still tractable — the behaviour of the reference process $\mathbb{Q}$ has Gaussian increments along each edge, so the induced coupling over the vertices is a joint Gaussian and thus the conditional distribution at the unknown central node is again another Gaussian. The resulting IMF procedure on the star-shaped tree structure becomes:

  1. Simulate the current process along each edge (started from a known marginal) to obtain coupling samples $Y_i, i \in \mathcal{S}$ at the known marginals.
  2. Obtain a sample from the unknown centre marginal according to the conditional distribution $\mathbb{Q}_{0|\mathcal{S}}$, which can be computed to be $Y_0 \sim \mathcal{N} (\sum \lambda_i Y_i, \sigma^2 Id )$
  3. Perform bridge-matching along each edge as normal, using Brownian bridges between the obtained coupling samples $(Y_0, Y_i)$.

The resulting algorithm, named TreeDSBM, is illustrated below.

Reciprocal Markovianised

Iterative Markov Fitting on a star-shaped tree: Known marginals are shown in blue, unknown barycentre is shown in red. (Left) Reciprocal projection—mixture of Brownian bridges between samples $Y_i$ at the known marginals (obtained by simulating the current Markov process over the tree) followed by sampling $Y_0 \sim \mathcal{N} (\sum \lambda_i Y_i, \sigma^2 Id )$ at the centre; (Right) Markov projection—fit an SDE using bridge-matching along each edge, obtaining a Markov process over the tree that traces out the same marginals.

Why does this algorithm make sense?

At first glance, the resulting algorithm is a relatively simple extension of the IMF procedure to the tree-structured SB setting. However, this combination actually results in a very natural method for computing Wasserstein barycentres, due to its connection to the fixed-point approaches discussed earlier.

Recall the fixed-point barycentre method that we described previously—at each iteration, a new candidate barycentre is constructed by pushing forward the current iteration through the map $\sum_i \lambda_i T_i$. The resulting iterates are known to converge quickly to the solution, however this involves solving for several OT maps $T_i$ at each iteration.

In TreeDSBM, the centre sample $Y_0$ is sampled from the coupling as $Y_0 \sim \mathcal{N} (\sum \lambda_i Y_i, \sigma^2 Id )$. The method can therefore be viewed as a natural counterpart of fixed-point methods for barycentre computation, but for the case of flow-based entropic OT solvers. In particular, it shows that the IMF and fixed-point iterations can in fact be combined into a single iterative procedure! Moreover, the expensive OT map computations required for the fixed-point procedure have instead been switched out for inexpensive and stable bridge-matching procedures, and empirically we found the TreeDSBM algorithm to retain the fast convergence property of the fixed-point procedure in terms of the number of iterations required (like in the figure below). The bridge-matching also occurs independently along each edge, so each network can be trained in parallel.

Reciprocal Markovianised

TreeDSBM applied to experimental setting of [8]: The ground-truth barycentre of three 2d datasets is shown on the left. The TreeDSBM algorithm computes an accurate approximation to the barycentre after only a couple of IMF steps.

By requiring only a single bridge-matching procedure along each edge before updating the barycentre, TreeDSBM strikes a good balance between the efficiency of iterative fixed-point-based approaches without requiring full OT map computations at each step, making it a natural and practical algorithm for Wasserstein barycentre computation. More broadly, it provides a great example of how the flow-matching ideas that are currently revolutionising generative modelling can also provide fresh perspectives for tackling long-studied classical problems in other domains.

Citation

If you find our paper, code, or this blog useful, please consider citing as

@inproceedings{
howard2025schrodinger,
title={Schr\"odinger Bridge Matching for Tree-Structured Costs and Entropic Wasserstein Barycentres},
author={Samuel Howard and Peter Potaptchik and George Deligiannidis},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=DliPKnn6e0}
} 

References


  1. Agueh and Carlier, 2011, Barycenters in the Wasserstein space ↩︎

  2. Álvarez-Esteban et al, 2016, A fixed-point approach to barycenters in Wasserstein space ↩︎

  3. von Lindheim, 2022, Simple Approximative Algorithms for Free-Support Wasserstein Barycenters ↩︎

  4. Korotin et al, 2022, Wasserstein Iterative Networks for Barycenter Estimation ↩︎

  5. Shi et al, 2023, Diffusion Schrödinger Bridge Matching ↩︎

  6. Peluchetti, 2023, Diffusion bridge mixture transports, Schrödinger bridge problems and generative modeling ↩︎

  7. Haasler et al, 2021, Multi-marginal Optimal Transport with a Tree-structured cost and the Schrödinger Bridge Problem ↩︎

  8. Noble et al, 2023, Tree-Based Diffusion Schrödinger Bridge with Applications to Wasserstein Barycenters ↩︎ ↩︎

Sam Howard
Sam Howard
PhD Student in Machine Learning