New State-of-the-Art

DVD: Deterministic Video Depth Estimation
with Generative Priors

Hongfei Zhang^1* Harold H. Chen^1,2* Chenfei Liao^1* Jing He^1* Zixin Zhang¹ Haodong Li³ Yihao Liang⁴

Kanghao Chen¹ Bin Ren⁵ Xu Zheng¹ Shuai Yang¹ Kun Zhou⁶ Yinchuan Li⁷ Nicu Sebe⁸ Ying-Cong Chen^1,2†

¹HKUST(GZ) ²HKUST ³UCSD ⁴Princeton University ⁵MBZUAI ⁶SZU ⁷Knowin ⁸UniTrento

*Equal Contribution ^†Corresponding Author

TL;DR: DVD effectively resolves the geometric hallucination issues in generative models and semantic ambiguities in discriminative baselines, delivering consistent, high-fidelity geometry.

Long Video Results

Tip: You can use the progress bar on the RGB video to control all videos synchronously. Drag the slider on the right to compare the results.

Set 1

1 / 3

Speed:

RGB Input

DVD (Ours) Video Depth Anything

Set 2

1 / 3

Speed:

RGB Input

DVD (Ours) Video Depth Anything

Short Video Results

Set 1

1 / 6

RGB Input

Video Depth Anything

DVD (Ours)

Set 2

1 / 6

RGB Input

Video Depth Anything

DVD (Ours)

Citation

If you find our work useful in your research, please consider citing:

@article{zhang2026dvd,
  title={DVD: Deterministic Video Depth Estimation with Generative Priors},
  author={Zhang, Hongfei and Chen, Harold Haodong and Liao, Chenfei and He, Jing and Zhang, Zixin and Li, Haodong and Liang, Yihao and Chen, Kanghao and Ren, Bin and Zheng, Xu and Yang, Shuai and Zhou, Kun and Li, Yinchuan and Sebe, Nicu and Chen, Ying-Cong},
  journal={arXiv preprint arXiv:2603.12250},
  year={2026}
}