Native Spatio-temporal 4D Variational AutoEncoder

ICML 2026

1CUHK MMLab, 2Kuaishou Technology, 3HKUST, 4CPII under InnoHK

Method

MY ALT TEXT

The overall architecture: we introduce a novel 4D VAE that operates directly in native 4D space, that is dynamic colored voxel space, without 2D projection. This preserves explicit spatio-temporal coordinates throughout the learned encoder and decoder, enabling both partial and complete 4D content encoding. To support a flexible temporal compression ratio, we also design a novel spatio-temporal window attention module that performs attention within local 4D windows. Additionally, we propose a differentiable voxel rendering loss based on sparse voxel rasterization to improve the geometry and color reconstruction quality.

BibTeX

@inproceedings{
  ding2026native,
  title={Native Spatio-Temporal 4D Variational Autoencoder},
  author={Ding, Lihe and Ye, weicai and Dong, Shaocong and Wang, Xintao and Wan, Pengfei and Gai, Kun and Xue, Tianfan},
  booktitle={Forty-third International Conference on Machine Learning},
  year={2026},
  }