Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints

workflow

Abstract. Animating hand-drawn sketches using traditional tools is challenging and complex. Sketches provide a visual basis for explanations, and animating these sketches offers an experience of real-time scenarios. We propose an approach for animating a given input sketch based on a descriptive text prompt. Our method utilizes a parametric representation of the sketch’s strokes. Unlike previous methods, which struggle to estimate smooth and accurate motion and often fail to preserve the sketch’s topology, we leverage a pre-trained text-to-video diffusion model with SDS loss to guide the motion of the sketch’s strokes. We introduce length-area (LA) regularization to ensure temporal consistency by accurately estimating the smooth displacement of control points across the frame sequence. Additionally, to preserve shape and avoid topology changes, we apply a shape-preserving As-Rigid-As-Possible (ARAP) loss to maintain sketch rigidity. Our method surpasses state-of-the-art performance in both quantitative and qualitative evaluations.

Comparison

BibTeX

@conference{grapp25,
author={Gaurav Rai and Ojaswa Sharma},
title={Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints},
booktitle={Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - GRAPP},
year={2025},
pages={151-160},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013304800003912},
isbn={978-989-758-728-3},
issn={2184-4321},
}