* denotes equal contribution
Recent extensions of 3D Gaussian Splatting (3DGS) to dynamic scenes achieve high-quality novel view synthesis by using neural networks to predict the time-varying deformation of each Gaussian. However, performing per-Gaussian neural inference at every frame poses a significant bottleneck, limiting rendering speed and increasing memory and compute requirements.
In this paper, we present Speedy Deformable 3D Gaussian Splatting (SpeeDe3DGS), a general pipeline for accelerating the rendering speed of dynamic 3DGS and 4DGS representations by reducing neural inference through two complementary techniques. First, we propose a temporal sensitivity pruning score that identifies and removes Gaussians with low contribution to the dynamic scene reconstruction. We also introduce an annealing smooth pruning mechanism that improves pruning robustness in real-world scenes with imprecise camera poses. Second, we propose GroupFlow, a motion analysis technique that clusters Gaussians by trajectory similarity and predicts a single rigid transformation per group instead of separate deformations for each Gaussian.
Together, our techniques accelerate rendering by 10.37×, reduce model size by 7.71×, and shorten training time by 2.71× on the NeRF-DS dataset. SpeeDe3DGS also improves rendering speed by 4.20× and 58.23× on the D-NeRF and HyperNeRF vrig datasets. Our methods are modular and can be integrated into any deformable 3DGS or 4DGS framework.
Neural inference is a major bottleneck in rendering speed for dynamic 3DGS representations. Speedy Deformable 3D Gaussian Splatting (SpeeDe3DGS) accelerates rendering speed by leveraging two key insights for reducing inference cost:
While we integrate our approach into Deformable 3D Gaussians, our methods are modular and can be applied to any similar deformable 3DGS or 4DGS framework.
For each Gaussian \(\mathcal{G}_i\), we calculate a temporal sensitivity pruning score \(\tilde{U}_{\mathcal{G}_i}\), which approximates its second-order sensitivity to the \(L_2\) loss across all training views: \[ \tilde{U}_{\mathcal{G}_i} \approx \nabla_{g_i}^2 L_2 \approx \sum_{\phi, t \in \mathcal{P}_{gt}} \left( \nabla_{g_i} I_{\mathcal{G}_t}(\phi) \right)^2, \] where \(g_i\) is the value of \(\mathcal{G}_i\) projected onto image space, \(\mathcal{P}_{gt}\) denotes the set of training poses and timesteps, and \(I_{\mathcal{G}_t}(\phi)\) is the rendered image at pose \(\phi\) and timestep \(t\).
Since deformation updates vary acrosss timesteps, these gradients are inherently time-dependent and capture the second-order effects of each Gaussian and its deformations on the dynamic scene reconstruction. As such, \(\tilde{U}_{\mathcal{G}_i}\) reflects each Gaussian's cumulative contribution to the scene reconstruction over time. We periodically prune low-contributing Gaussians during training, thereby reducing the neural inference load and boosting rendering speed.
                                        Additionally, we introduce an Annealing Smooth Pruning (ASP)
                                        mechanism that enhances the robustness of pruning to imprecise
                                        camera poses in real-world scenes. The comparison above illustrates an example
                                        where ASP produces cleaner results than both standard pruning and the unpruned
                                        baseline.
                                        
                                    
 
                        Given a dense deformable 3DGS model, each Gaussian \(\mathcal{G}_i\) is represented as a sequence of mean positions \(\mathcal{M}_i = \{\mu_i^t\}_{t=0}^{F-1}\) across \(F\) timesteps.
We initialize \(J\) control trajectories \(\{h_j^t\}\) via farthest point sampling at \(t=0\), and assign each \(\mu_i\) to the most similar \(h_j\) using the following trajectory similarity score:
$$ S_{i,j} = \lambda_r \, \mathrm{std}_t(\| \mu_i^t - h_j^t \|) + (1-\lambda_r) \, \mathrm{mean}_t(\| \mu_i^t - h_j^t \|). $$
We then fit a group-wise SE(3) trajectory to each group \(\mathcal{M}^j\) by aligning sampled means over time. To predict the deformation of \(\mu_i \in \mathcal{M}^j\) at timestep \(t\), we apply a learned SE(3) transformation relative to its control point:
$$ \mu_i^t = R_j^t (\mu_i^0 - h_j^0) + h_j^0 + T_j^t. $$
The shared flow parameters \(\{h_j^0, R_j^t, T_j^t\}\) are optimized during training. This approach — GroupFlow — reduces the number of predicted transformations per frame from \(N\) (one per Gaussian) to \(J\) (one per group), significantly accelerating both training and rendering.
@article{TuYing2025speede3dgs,
    author    = {Tu, Allen and Ying, Haiyang and Hanson, Alex and Lee, Yonghan and Goldstein, Tom and Zwicker, Matthias},
    title     = {Speedy Deformable 3D Gaussian Splatting: Fast Rendering and Compression of Dynamic Scenes},
    journal   = {arXiv preprint arXiv:2506.07917},
    year      = {2025},
    url       = {https://speede3dgs.github.io/}
}