Speedy Deformable 3D Gaussian Splatting: Fast Rendering and Compression of Dynamic Scenes

Abstract

Recent extensions of 3D Gaussian Splatting (3DGS) to dynamic scenes achieve high-quality novel view synthesis by using neural networks to predict the time-varying deformation of each Gaussian. However, performing per-Gaussian neural inference at every frame poses a significant bottleneck, limiting rendering speed and increasing memory and compute requirements.

In this paper, we present Speedy Deformable 3D Gaussian Splatting (SpeeDe3DGS), a general pipeline for accelerating the rendering speed of dynamic 3DGS and 4DGS representations by reducing neural inference through two complementary techniques. First, we propose a temporal sensitivity pruning score that identifies and removes Gaussians with low contribution to the dynamic scene reconstruction. We also introduce an annealing smooth pruning mechanism that improves pruning robustness in real-world scenes with imprecise camera poses. Second, we propose GroupFlow, a motion analysis technique that clusters Gaussians by trajectory similarity and predicts a single rigid transformation per group instead of separate deformations for each Gaussian.

Together, our techniques accelerate rendering by 10.37×, reduce model size by 7.71×, and shorten training time by 2.71× on the NeRF-DS dataset. SpeeDe3DGS also improves rendering speed by 4.20× and 58.23× on the D-NeRF and HyperNeRF vrig datasets. Our methods are modular and can be integrated into any deformable 3DGS or 4DGS framework.

Method

Neural inference is a major bottleneck in rendering speed for dynamic 3DGS representations. Speedy Deformable 3D Gaussian Splatting (SpeeDe3DGS) accelerates rendering speed by leveraging two key insights for reducing inference cost:

A deformation is predicted for every Gaussian, so pruning Gaussians with low contribution to the dynamic scene reconstruction directly lowers the inference load.
Gaussians that follow similar motion patterns can be grouped, enabling a single shared trajectory to be predicted for each group instead of separate deformations for each Gaussian.

While we integrate our approach into Deformable 3D Gaussians, our methods are modular and can be applied to any similar deformable 3DGS or 4DGS framework.

Temporal Sensitivity Pruning

For each Gaussian $\mathcal{G}_i$, we calculate a temporal sensitivity pruning score $\tilde{U}_{\mathcal{G}_i}$, which approximates its second-order sensitivity to the $L_2$ loss across all training views: \[ \tilde{U}_{\mathcal{G}_i} \approx \nabla_{g_i}^2 L_2 \approx \sum_{\phi, t \in \mathcal{P}_{gt}} \left( \nabla_{g_i} I_{\mathcal{G}_t}(\phi) \right)^2, \] where $g_i$ is the value of $\mathcal{G}_i$ projected onto image space, $\mathcal{P}_{gt}$ denotes the set of training poses and timesteps, and $I_{\mathcal{G}_t}(\phi)$ is the rendered image at pose $\phi$ and timestep $t$.

Since deformation updates vary acrosss timesteps, these gradients are inherently time-dependent and capture the second-order effects of each Gaussian and its deformations on the dynamic scene reconstruction. As such, $\tilde{U}_{\mathcal{G}_i}$ reflects each Gaussian's cumulative contribution to the scene reconstruction over time. We periodically prune low-contributing Gaussians during training, thereby reducing the neural inference load and boosting rendering speed.

Additionally, we introduce an Annealing Smooth Pruning (ASP) mechanism that enhances the robustness of pruning to imprecise camera poses in real-world scenes. The comparison above illustrates an example where ASP produces cleaner results than both standard pruning and the unpruned baseline.

GroupFlow

Given a dense deformable 3DGS model, each Gaussian $\mathcal{G}_i$ is represented as a sequence of mean positions $\mathcal{M}_i = \{\mu_i^t\}_{t=0}^{F-1}$ across $F$ timesteps.

We initialize $J$ control trajectories $\{h_j^t\}$ via farthest point sampling at $t=0$, and assign each $\mu_i$ to the most similar $h_j$ using the following trajectory similarity score:

$$ S_{i,j} = \lambda_r \, \mathrm{std}_t(\| \mu_i^t - h_j^t \|) + (1-\lambda_r) \, \mathrm{mean}_t(\| \mu_i^t - h_j^t \|). $$

We then fit a group-wise SE(3) trajectory to each group $\mathcal{M}^j$ by aligning sampled means over time. To predict the deformation of $\mu_i \in \mathcal{M}^j$ at timestep $t$, we apply a learned SE(3) transformation relative to its control point:

$$ \mu_i^t = R_j^t (\mu_i^0 - h_j^0) + h_j^0 + T_j^t. $$

The shared flow parameters $\{h_j^0, R_j^t, T_j^t\}$ are optimized during training. This approach — GroupFlow — reduces the number of predicted transformations per frame from $N$ (one per Gaussian) to $J$ (one per group), significantly accelerating both training and rendering.

BibTeX