We present the first neural RGBD SLAM method capable of photorealistically reconstructing real-world scenes.
Despite modern SLAM methods achieving impressive results on synthetic datasets, they still struggle with real-world datasets. Our approach utilizes 3D Gaussians as a primary unit for our scene representation to overcome the limitations of the previous methods. We observe that classical 3D Gaussians are hard to use in a monocular setup: they can't encode accurate geometry and are hard to optimize with single-view sequential supervision. By extending classical 3D Gaussians to encode geometry, and designing a novel scene representation and the means to grow, and optimize it, we propose a SLAM system capable of reconstructing and rendering real-world datasets without compromising on speed and efficiency.
We show that Gaussian-SLAM can reconstruct and photorealistically render real-world scenes. We evaluate our method on common synthetic and real-world datasets and compare it against other state-of-the-art SLAM methods. Finally, we demonstrate, that the final 3D scene representation that we obtain can be rendered in Real-time thanks to the efficient Gaussian Splatting rendering.
Upon receiving an input posed RGBD keyframe, it is subsampled, taking into account the color gradient. The sampled points are then projected into 3D space, where new Gaussians are initialized with their means at these sampled locations. These new 3D Gaussians are added to the currently active segment of the global map within the sparse regions. The input RGBD keyframe is temporarily stored alongside other keyframes that have contributed to the active sub-map. Once the new Gaussians have been integrated into the active sub-map, all keyframes contributing to the active sub-map are rendered. Subsequently, the depth and color losses are computed w.r.t. to the sub-map input keyframes. Following this, we update the parameters of the 3D Gaussians in the active sub-map. This process is repeated for a fixed number of iterations.
We compare our method with other recent pipelines through side-by-side rendering of desk_1 scene from TUM-RGBD dataset.
We also compare Gaussian-SLAM on more scenes from TUM-RGBD and ScanNet.
Note. We couldn't obtain scene renders from GO-SLAM.
NICE-SLAM
ESLAM
Point-SLAM
Gaussian-SLAM (ours)
Ground-truth
Here we show the renders of the reconstructed meshes on TUM-RGBD and ScanNet.
While Gaussian-SLAM outperforms all other recent methods in rendering,
it achieves on-par reconstruction results with state-of-the-art approaches.
Note. We couldn't obtain a mesh on a real-word dataset from GO-SLAM.
NICE-SLAM
ESLAM
Point-SLAM
Gaussian-SLAM (ours)
Ground-truth
Here we show renders of the meshes on Replica synthetic dataset. Gaussian-SLAM performs on par with state-of-the-art methods in reconstruction on synthetic scenes as well.
ESLAM
GO-SLAM
Point-SLAM
Gaussian-SLAM (ours)
Ground-truth
Scenes reconstructed with Gaussian-SLAM can be rendered in real-time, thanks to efficient Gaussian Splatting rendering.
@misc{yugay2023gaussianslam,
title={Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting},
author={Vladimir Yugay and Yue Li and Theo Gevers and Martin R. Oswald},
year={2023},
eprint={2312.10070},
archivePrefix={arXiv},
primaryClass={cs.CV}
}