Learning Event Guided High Dynamic Range Video Reconstruction

Yixin Yang1,2 Jin Han3,4 Jinxiu Liang1,2 Imari Sato3,4 Boxin Shi*,1,2

1 National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University

2 National Engineering Research Center of Visual Technology, School of Computer Science, Peking University

3 Graduate School of Information Science and Technology, The University of Tokyo 4 National Institute of Informatics

{yangyixin93, cssherryliang, shiboxin}@pku.edu.cn  {jinhan, imarik}@nii.ac.jp


Limited by the trade-off between frame rate and exposure time when capturing moving scenes with conventional cameras, frame based HDR video reconstruction suffers from scene-dependent exposure ratio balancing and ghosting artifacts. Event cameras provide an alternative visual representation with a much higher dynamic range and temporal resolution free from the above issues, which could be an effective guidance for HDR imaging from LDR videos. In this paper, we propose a multimodal learning framework for event guided HDR video reconstruction. In order to better leverage the knowledge of the same scene from the two modalities of visual signals, a multimodal representation alignment strategy to learn a shared latent space and a fusion module tailored to complementing two types of signals for different dynamic ranges in different regions are proposed. Temporal correlations are utilized recurrently to suppress the flickering effects in the reconstructed HDR video. The proposed HDRev-Net demonstrates state-of-the-art performance quantitatively and qualitatively for both synthetic and real-world data.

Animated Results


Supplementary Video


 author = {Yixin Yang, Jin Han, Jinxiu Liang, Imari Sato, Boxin Shi},
 booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
 title = {Learning Event Guided High Dynamic Range Video Reconstruction},
 year = {2023}