StableMotion: One Step Motion Estimation with Diffusion Prior

Ziyi, Wang; Haipeng, Li; Lin, Sui; Tianhao, Zhou; Hai, Jiang; Lang, Nie; Bing, Zeng; Shuaicheng, Liu

StableMotion: One Step Motion Estimation with Diffusion Prior

Ziyi Wang, Haipeng Li, Lin Sui, Tianhao Zhou, Hai Jiang, Lang Nie, Bing Zeng, Shuaicheng Liu,

University of Electronic Science and Technology of China

Paper Code arXiv

StableMotion leverages geometry and content priors from large-scale diffusion models to perform one-step motion estimation, enabling single-image-based image rectification tasks such as Stitched Image Rectangling (SIR) and Rolling Shutter Correction (RSC).

Abstract

We present StableMotion, a novel framework leverages knowledge (geometry and content priors) from pretrained large-scale image diffusion models to perform motion estimation, solving single-image-based image rectification tasks such as Stitched Image Rectangling (SIR) and Rolling Shutter Correction (RSC). Specifically, StableMotion framework takes text-to-image Stable Diffusion (SD) models as backbone and repurposes it into an image-to-motion estimator. To mitigate inconsistent output produced by diffusion models, we propose Adaptive Ensemble Strategy (AES) that consolidates multiple outputs into a cohesive, high-fidelity result. Additionally, we present the concept of Sampling Steps Disaster (SSD), the counterintuitive scenario where increasing the number of sampling steps can lead to poorer outcomes, which enables our framework to achieve one-step inference. StableMotion is verified on two image rectification tasks and delivers state-of-the-art performance in both, as well as showing strong generalizability. Supported by SSD, StableMotion offers a speedup of 200 times compared to previous diffusion model-based methods.

Repurposing Strategy

StableMotion framework takes text-to-image Stable Diffusion (SD) models as backbone and repurposes it into an image-to-motion estimator. We construct the repurposing process with on the joint supervision of condition loss and reconstruction loss. This combination of explicit motion modeling and implicit distributional regularization allows our method to achieve robust rectification results while maintaining architectural simplicity and inference efficiency.

One-Step Inference

We achieve muti-batch inference in one step, and perform Adaptive Ensemble Strategy (AES) on each batch. The next section will introduce the concept of Sampling Steps Disaster (SSD), the counterintuitive scenario where increasing the number of sampling steps can lead to poorer outcomes, which enables our framework to achieve one-step inference.

Sampling Steps Disaster

We present and prove the concept of Sampling Steps Disaster (SSD), the counterintuitive scenario where increasing the number of sampling steps can lead to poorer outcomes, enabling our framework to achieve one-step inference. For models with multiple training objectives, such as those have incorporated conditional losses, this phenomenon is commonly observed.

Adaptive Ensemble Strategy

We propose calculating content-aware masks for each image, enabling content-aware ensemble. AES successfully solves the issue of a) In tasks such as image rectangling, images are warped according to a motion field, which can create boundary artifacts—typically blank white margins around the warped image, and b) Diffusion models can produce inconsistent results due to their generative nature, leading to variations in the output images.

BibTeX

@misc{wang2025stablemotionrepurposingdiffusionbasedimage,
                title={StableMotion: Repurposing Diffusion-Based Image Priors for Motion Estimation},
                author={Ziyi Wang and Haipeng Li and Lin Sui and Tianhao Zhou and Hai Jiang and Lang Nie and Shuaicheng Liu},
                year={2025},
                eprint={2505.06668},
                archivePrefix={arXiv},
                primaryClass={cs.CV},
                url={https://arxiv.org/abs/2505.06668},
                }
}