Persistent Nature

Despite increasingly realistic image quality, recent 3D image generative models often operate on bounded domains with limited camera motions. We investigate the task of unconditionally synthesizing unbounded nature scenes, enabling arbitrarily large camera motion while maintaining a persistent 3D world model. Our scene representation consists of an extendable, planar scene layout grid, which can be rendered from arbitrary camera poses via a 3D decoder and volume rendering, and a panoramic skydome. Based on this representation, we learn a generative world model solely from single-view internet photos. Our method enables simulating long flights through 3D landscapes, while maintaining global scene consistency--for instance, returning to the starting point yields the same view of the scene. Our approach enables scene extrapolation beyond the fixed bounds of current 3D generative models, while also supporting a persistent, camera-independent world representation that stands in contrast to auto-regressive 3D prediction models.

We also experiment with a model architecture based on extendable triplane units. This setup adds vertical feature planes in addition to the 2D layout feature grid, allowing for more complex geometry and faster rendering. The three feature planes are generated independently tied to the same latent code, and then stitched along the appropriate dimensions using the same SOAT procedure.

This variation is more temporally consistent and faster to render compared to the extendable layout model.

Thanks to Andrew Liu and Richard Bowen for the fruitful discussions and helpful comments.

BibTeX


	@inproceedings{chai2023persistentnature,
		title     = {Persistent Nature: A Generative Model of Unbounded 3D Worlds},
		author    = {Chai, Lucy and Tucker, Richard and Li, Zhengqi and Isola, Phillip and Snavely, Noah},
		booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
		year      = {2023}
	}

Persistent Nature
A Generative Model of Unbounded 3D Worlds

CVPR 2023

We learn an unconditional generative model to synthesize landscape scenery with an unbounded and persistent representation, supporting arbitrary camera motion and cyclic flight patterns. We jointly learn geometry and appearance from single-view photos and monocular depth prediction.

Abstract

Summary Video

Extendable Triplane Variation

Acknowledgements

BibTeX

Persistent Nature A Generative Model of Unbounded 3D Worlds

CVPR 2023

We learn an unconditional generative model to synthesize landscape scenery with an unbounded and persistent representation, supporting arbitrary camera motion and cyclic flight patterns. We jointly learn geometry and appearance from single-view photos and monocular depth prediction.

Abstract

Summary Video

Extendable Triplane Variation

Acknowledgements

BibTeX

Persistent Nature
A Generative Model of Unbounded 3D Worlds