Video face restoration aims to recover high-quality face videos from severely degraded inputs while
preserving realistic facial details, stable identity, and temporal coherence. Recent diffusion-based
methods have brought strong generative priors to restoration and enabled more realistic detail synthesis.
However, existing approaches for face videos still rely heavily on generic diffusion priors and multi-step
sampling, which limits both facial adaptation and inference efficiency. These limitations motivate the use
of one-step diffusion for video face restoration, but achieving faithful facial recovery together with
temporally stable outputs remains challenging. In this paper, we propose \textbf{DVFace}, a one-step
diffusion framework for real-world video face restoration. Specifically, we introduce a spatio-temporal
dual-codebook design to extract complementary spatial and temporal facial priors from degraded videos. We
further propose an asymmetric spatio-temporal fusion module to inject these priors into the diffusion
backbone according to their distinct roles. Extensive experiments on synthetic and real-world benchmarks
demonstrate that DVFace achieves superior restoration quality, temporal consistency, and identity
preservation compared with recent methods.