Group Signal Alignment

When analyzing collections of time series—whether neural recordings, motion capture data, or physiological signals—we often encounter signals that share a common underlying shape but are misaligned in time. Standard averaging smears out these temporal differences, destroying the very structure we want to study. Group signal alignment solves this by jointly finding a common reference and the time warps that best align each signal to it.

In time-warped group alignment, the goal is to estimate central tendency $\textcolor{#f97316}{\mu}(t)$ . We solve this using GDTW with iterative refinement. Below is an illustration.

Dataset:

Computing...

Optimization Formulation

We formulate group alignment as an optimization problem, where the warp functions $\phi_1, \ldots, \phi_M$ and the target signal $\textcolor{#f97316}{\mu}$ are the variables to be chosen. Our formulation follows the pattern common in machine learning: we minimize an objective that includes a loss function measuring the alignment error between warped signals and target, plus regularization terms that penalize excessive warping of the time axis.

Loss Functional

Let $L: \mathbb{R}^d \to \mathbb{R}$ be a vector penalty function. We define the loss associated with a time warp function $\phi_i$ , on signal $x_i$ and target $\textcolor{#f97316}{\mu}$ , as

\mathcal{L}(\phi_i, \textcolor{#f97316}{\mu}) = \int_0^1 L(x_i(\phi_i(t)) - \textcolor{#f97316}{\mu}(t)) \, dt,

the average value of the penalty function of the difference between the time-warped signal and the target. The smaller $\mathcal{L}(\phi_i, \textcolor{#f97316}{\mu})$ is, the better we consider $\tilde{x}_i = x_i \circ \phi_i$ to approximate $\textcolor{#f97316}{\mu}$ .

Group Alignment via Regularized Loss Minimization

We propose to choose $\phi_1, \ldots, \phi_M$ and $\textcolor{#f97316}{\mu}$ by solving the optimization problem

\begin{array}{ll} \text{minimize} & f(\phi_1, \ldots, \phi_M, \textcolor{#f97316}{\mu}) = \displaystyle\sum_{i=1}^M \left( \mathcal{L}(\phi_i, \textcolor{#f97316}{\mu}) + \lambda^{\text{cuml}} \mathcal{R}^{\text{cuml}}(\phi_i) + \lambda^{\text{inst}} \mathcal{R}^{\text{inst}}(\phi_i) \right) \\ \text{subject to} & \phi_i(0)=0, \quad \phi_i(1)=1, \end{array}

where $\lambda^{\text{cuml}} \in \mathbb{R}_+$ and $\lambda^{\text{inst}} \in \mathbb{R}_+$ are positive hyperparameters used to vary the relative weight of the three terms. Since regularizers $R^{\text{inst}}$ and $R^{\text{cuml}}$ depend only on $\phi$ , not on $\textcolor{#f97316}{\mu}$ , they are identical to those used in pairwise time warping and remain unchanged in the group setting. The variables in this optimization problem are the warp functions $\phi_1, \ldots, \phi_M$ and the target signal $\textcolor{#f97316}{\mu}$ .

The Iterative Solver

This problem is hard to solve exactly, but a simple iterative procedure works well in practice. We observe that if we fix the target $\textcolor{#f97316}{\mu}$ , the problem splits into $M$ separate dynamic time warping problems that we can solve (separately, in parallel). Conversely, if we fix the warping functions $\phi_1, \ldots, \phi_M$ , we can optimize over $\textcolor{#f97316}{\mu}$ by minimizing

\sum_{i=1}^M \int_0^1 L(x_i(\phi_i(t)) - \textcolor{#f97316}{\mu}(t)) \, dt.

This is typically easy to do; for example, with square loss, we choose $\textcolor{#f97316}{\mu}(t)$ to be the mean of $x_i(\phi_i(t))$ ; with absolute value loss, we choose $\textcolor{#f97316}{\mu}(t)$ to be the median of $x_i(\phi_i(t))$ .

We solve this problem using block coordinate descent, alternating between optimizing the warping functions and the target signal. This leads to a simple iterative procedure:

\textcolor{#f97316}{\mu} \leftarrow

Initialize(choose method below)

2:repeat

3:for

i = 1, \ldots, M

do (in parallel)

\phi_i \leftarrow \underset{\phi}{\arg\min} \int_0^1 L(x_i(\phi(t)) - \textcolor{#f97316}{\mu}(t))\, dt + \lambda^{\text{cuml}} \mathcal{R}^{\text{cuml}}(\phi) + \lambda^{\text{inst}} \mathcal{R}^{\text{inst}}(\phi)

\textcolor{#f97316}{\mu}(t) \leftarrow \text{median}\{x_1(\phi_1(t)), \ldots, x_M(\phi_M(t))\}

(pointwise)

6:until convergence

Group time-warped alignment via block coordinate descent.

Dataset:

Iteration

k

Computing...

Iterative alignment process. At $k=0$ , the original misaligned signals with the initial reference $\textcolor{#f97316}{\mu}^{(0)}$ highlighted in orange. Each subsequent iteration $k=1,2,3$ shows signals converging toward alignment.

This method of alternating between updating the target $\textcolor{#f97316}{\mu}$ and updating the warp functions (in parallel) typically converges quickly. However, it need not converge to the global minimum. One simple initialization is to start with no warping, i.e., $\phi_i(t) = t$ . Another is to choose one of the original signals as the initial value for $\textcolor{#f97316}{\mu}$ .

Shaping the Results

There are two main ways to influence the behavior of this algorithm. First, we can choose a good initialization for $\textcolor{#f97316}{\mu}^{(0)}$ —selecting a signal that is already “central” in some sense can speed convergence and improve final alignment quality. We explore several initialization strategies below. Second, we can add constraints to the optimization problem itself, restricting the space of admissible warping functions.

The choice of initialization in line 1 is our focus. A poor choice may require more iterations or, in pathological cases, converge to a suboptimal alignment.

When aligning a group of signals to a common reference, the choice of initial reference $\textcolor{#f97316}{\mu}^{(0)}$ can significantly impact convergence speed and final alignment quality. While the iterative algorithm will eventually converge regardless of initialization, a good starting point can reduce the number of iterations needed and avoid local minima in non-convex settings.

Initialization Method 1: Pointwise Median

The simplest approach is the pointwise median, which computes a synthetic reference by taking the median amplitude at each time point:

\textcolor{#f97316}{\mu}^{(0)}(t) = \text{median}\{x_1(t), \ldots, x_M(t)\}

This is robust to outliers at each time point and captures the “typical” amplitude. However, unlike the other methods we'll discuss, it does not select an actual signal from the group—it creates a new synthetic signal that may not correspond to any physically realizable waveform.

Limitation: If the signals are significantly misaligned, the pointwise median can have discontinuities or unnatural shapes where signals cross frequently. It may not represent any realistic signal morphology.

Dataset:

Initial μ: Pointwise median

\lambda^{\text{inst}}

\lambda^{\text{cuml}}

Iteration

k

Computing...

Initialization Method 2: Exemplar

To address the limitation that the pointwise median is synthetic, the exemplar method first computes the pointwise median, then selects the actual signal that is closest to it:

\bar{\textcolor{#f97316}{\mu}} = \text{pointwise-median}(x_1, \ldots, x_M)

\textcolor{#f97316}{\mu}^{(0)} = x_{i^*} \quad \text{where} \quad i^* = \arg\min_i \| x_i - \bar{\textcolor{#f97316}{\mu}} \|_2

Improvement over pointwise median: The exemplar is always a real signal from the group, guaranteeing a physically realizable reference. It inherits the robustness of the pointwise median while producing a valid signal shape.

Limitation: The pointwise median that we're approximating may still be a poor target if signals are severely misaligned—the “closest real signal” to a bad reference is still influenced by that bad reference.

Dataset:

Initial μ: Exemplar

\lambda^{\text{inst}}

\lambda^{\text{cuml}}

Iteration

k

Computing...

Initialization Method 3: Warping Functional Medoid

The warping functional medoid takes a fundamentally different approach: instead of measuring distances in amplitude space, it measures distances in the time-warped metric. It selects the signal that minimizes the total DTW cost when all other signals are aligned to it:

\textcolor{#f97316}{\mu}^{(0)} = \arg\min_{x_j} \displaystyle\sum_{i \neq j} \left( \mathcal{L}(\phi_i^\star, x_j) + \lambda^{\text{cuml}} \mathcal{R}^{\text{cuml}}(\phi_i^\star) + \lambda^{\text{inst}} \mathcal{R}^{\text{inst}}(\phi_i^\star) \right)

where $\phi_i^\star$ is the optimal warping function aligning $x_i$ to candidate reference $x_j$ .

Improvement over Karcher median: While the Karcher median finds the most central signal in $L^2$ space, the warping functional medoid finds the most central signal under the DTW metric. This accounts for timing variations: a signal that is far from others in raw amplitude may actually be very similar after time alignment, making it a better reference for iterative warping.

Limitation: Computing the warping functional medoid requires $M(M-1)$ pairwise DTW computations, making it more expensive than the other methods. However, this cost is often worthwhile when signals have significant timing variability.

Dataset:

Initial μ: Warping functional medoid

\lambda^{\text{inst}}

\lambda^{\text{cuml}}

Iteration

k

Computing...

Initialization Method 4: Karcher Median

Rather than approximating a synthetic reference, the Karcher median (also called the geometric median or L1-medoid) directly finds the signal that is most central in the group:

\textcolor{#f97316}{\mu}^{(0)} = x_{i^*} \quad \text{where} \quad i^* = \arg\min_i \sum_{j \neq i} \| x_i - x_j \|_2

Improvement over exemplar: Instead of finding the signal closest to an intermediate synthetic reference, the Karcher median directly optimizes for centrality by minimizing the sum of distances to all other signals. This makes it robust to outliers and ensures the selected signal is truly “central” in $L^2$ signal space.

Limitation: Euclidean distance doesn't account for temporal misalignment. A signal may be geometrically central but temporally shifted, requiring significant warping from all other signals to align.

Dataset:

Initial μ: Karcher median

\lambda^{\text{inst}}

\lambda^{\text{cuml}}

Iteration

k

Computing...

Summary

Method	Best When...	Complexity
Pointwise median	Robustness to outliers is critical; synthetic reference is acceptable	$O(MN \log M)$
Exemplar	You want a real signal that approximates the “average”	$O(MN \log M)$
Karcher median	Signals are roughly aligned; you need a fast, robust baseline	$O(M^2 N)$

Time-Centered Group Alignment

Centering Constraint

In addition to regularization, we can impose an optional time-centering constraint that requires the warping functions to be evenly arranged about the identity $\phi(t) = t$ , such that

\frac{1}{M}\sum_{i=1}^M \tilde{\phi}_i(t) = t.

We denote time-centered time-warp functions satisfying this constraint as $\tilde{\phi}_i$ . The resulting centered warp functions produce a time-centered estimate of central tendancy,

\textcolor{#f97316}{\tilde{\mu}} = \mu( (x_1 \circ \tilde{\phi}_1), \ \ldots, (x_M \circ \tilde{\phi}_M) ).

Time-Centered Group Alignment via Regularized Loss Minimization

To obtain a time-centered alignment, we choose $\tilde{\phi}_1, \ldots, \tilde{\phi}_M$ and $\textcolor{#f97316}{\tilde{\mu}}$ by solving the optimization problem

\begin{array}{ll} \text{minimize} & \tilde{f}(\tilde{\phi}_1, \ldots, \tilde{\phi}_M, \textcolor{#f97316}{\tilde{\mu}}) = \displaystyle\sum_{i=1}^M \left( \mathcal{L}(\tilde{\phi}_i, \textcolor{#f97316}{\tilde{\mu}}) + \lambda^{\text{cuml}} \mathcal{R}^{\text{cuml}}(\tilde{\phi}_i) + \lambda^{\text{inst}} \mathcal{R}^{\text{inst}}(\tilde{\phi}_i) \right) \\ \text{subject to} & \tilde{\phi}_i(0)=0, \quad \tilde{\phi}_i(1)=1, \quad \frac{1}{M}\sum_{i=1}^M \tilde{\phi}_i(t) = t, \end{array}

where the tilde notation denotes centered quantities: $\tilde{\phi}_i$ are the centered warp functions and $\textcolor{#f97316}{\tilde{\mu}}$ is the centered mean.

The following comparison shows standard alignment (top) versus centered alignment (bottom). Notice how in centered alignment, the warping functions are evenly distributed around the identity—some signals are warped forward in time while others are warped backward, maintaining balance around the original time axis:

Dataset:

Initial μ: Karcher median

\lambda^{\text{inst}}

\lambda^{\text{cuml}}

Iteration

k

Standard alignment: φ(t), μ(t)

Computing...

Centered alignment: φ̃(t), μ̃(t)

Computing...

Enforcing Centering

The centering constraint can be enforced within the iterative algorithm by adding a simple projection step. After computing the unconstrained warps $\phi_i$ in each iteration, we compute their pointwise mean $\bar{\phi}(t)$ and subtract the deviation from the identity to obtain centered warps $\tilde{\phi}_i$ :

\textcolor{#f97316}{\tilde{\mu}} \leftarrow

Initialize(choose method)

2:repeat

3:for

i = 1, \ldots, M

do (in parallel)

\phi_i \leftarrow \underset{\phi}{\arg\min} \int_0^1 L(x_i(\phi(t)) - \textcolor{#f97316}{\tilde{\mu}}(t))\, dt + \lambda^{\text{cuml}} \mathcal{R}^{\text{cuml}}(\phi) + \lambda^{\text{inst}} \mathcal{R}^{\text{inst}}(\phi)

\bar{\phi}(t) \leftarrow \frac{1}{M}\sum_{i=1}^M \phi_i(t)

(mean warp)

\tilde{\phi}_i(t) \leftarrow \phi_i(t) - \bar{\phi}(t) + t, \quad i = 1, \ldots, M

(center)

\textcolor{#f97316}{\tilde{\mu}}(t) \leftarrow \text{median}\{x_1(\tilde{\phi}_1(t)), \ldots, x_M(\tilde{\phi}_M(t))\}

(pointwise)

8:until convergence

Time-centered group alignment. Lines 5–6 project the warps onto the centering constraint.

Lines 5–6 are the centering projection. Subtracting the mean warp $\bar{\phi}(t)$ and adding back the identity ensures that the centered warps satisfy $\frac{1}{M}\sum_i \tilde{\phi}_i(t) = t$ at every time point. One can verify this directly:

\frac{1}{M}\sum_{i=1}^M \tilde{\phi}_i(t) = \frac{1}{M}\sum_{i=1}^M \bigl(\phi_i(t) - \bar{\phi}(t) + t\bigr) = \bar{\phi}(t) - \bar{\phi}(t) + t = t.

The projection is inexpensive—a single pass over the warps at each iteration—and does not change the structure of the solver. The only difference from the unconstrained algorithm is that the target $\textcolor{#f97316}{\tilde{\mu}}$ in line 7 is computed from the centered warps rather than the raw warps, so the reference signal stays anchored to the original time axis as the iterations progress.

Matrix Notation

It is convenient to collect signals and warps into matrices. Let $\mathbf{X} = \{x_1, \ldots, x_M\}$ denote the collection of original signals, $\mathbf{\Phi} = \{\phi_1, \ldots, \phi_M\}$ the warping functions obtained from group alignment, and $\textcolor{#f97316}{\mu}$ the estimated central tendency. After centering, we write

\mathbf{\tilde{\Phi}} = \{\tilde{\phi}_1, \ldots, \tilde{\phi}_M\}

for the centered warps satisfying $\frac{1}{M}\sum_i \tilde{\phi}_i(t) = t$ , and

\mathbf{\tilde{X}} = \{x_1 \circ \tilde{\phi}_1, \ldots, x_M \circ \tilde{\phi}_M\}

for the centered aligned signals—the original signals composed with the centered warps. In both cases, the tilde indicates that the centering constraint has been applied.

These two representations provide complementary views of the data: $\mathbf{\tilde{X}}$ captures what the signals look like after alignment (amplitude structure), while $\mathbf{\tilde{\Phi}}$ captures how they were warped to get there (timing structure). Together they form a complete decomposition of the original signals into shape and timing components, which can be analyzed independently or jointly via SVD.

Looking ahead

Group alignment gives us a principled way to go from a noisy collection of misaligned signals to a clean pair of structured representations—aligned shapes and the warps that produced them. The centering constraint anchors these representations to the original time axis, making the decomposition unique and directly interpretable. With $\mathbf{\tilde{X}}$ and $\mathbf{\tilde{\Phi}}$ in hand, we are ready to ask a deeper question: what are the principal modes of variation in amplitude and timing? We take this up in the next post, where we apply SVD to extract a low-dimensional basis for both.