Collaborative photorealistic 3D reconstruction from multiple agents enables rapid large-scale scene capture for virtual production and cooperative multi-robot exploration. While recent 3D Gaussian Splatting (3DGS) SLAM algorithms can generate high-fidelity real-time mapping, most existing multi-agent Gaussian SLAM methods still rely on RGB-D sensors to obtain metric depth and simplify cross-agent alignment, which limits deployment on lightweight, low-cost, or power-constrained robotic platforms.
To address this challenge, we propose MAGS-SLAM, the first RGB-only multi-agent 3DGS SLAM framework for collaborative scene reconstruction. Each agent independently builds local monocular Gaussian submaps and transmits compact submap summaries rather than raw observations or dense maps. To facilitate robust collaboration in the presence of monocular scale ambiguity, our framework integrates compact submap communication, geometry- and appearance-aware loop verification, and occupancy-aware Gaussian fusion, enabling coherent global reconstruction without active depth sensors.
We further introduce the ReplicaMultiAgent Plus benchmark for evaluating collaborative Gaussian SLAM. Extensive experiments on synthetic and real-world datasets show that MAGS-SLAM achieves competitive tracking accuracy and comparable or superior rendering quality to state-of-the-art RGB-D collaborative Gaussian SLAM methods while relying only on RGB images.
Photorealistic multi-agent reconstructions on ReplicaMultiAgent and ReplicaMultiAgent Plus datasets. Click any thumbnail below to play the corresponding result.
Globally fused TSDF mesh together with the per-agent camera trajectories (colored polylines) on synthetic (ReplicaMultiAgent, ReplicaMultiAgent Plus) and real-world (ETH3D, Tanks and Temples) datasets. Drag inside the viewer to rotate, right-drag to pan, and scroll to zoom. Click a thumbnail to switch scenes.
A single agent can only reconstruct the portion of the scene it has actually visited. On the real-world 7-Scenes Fire sequence, each individual agent (1-4) covers a partial region of the room, while collaborative fusion across agents produces a complete and globally consistent reconstruction. Click a thumbnail to swap meshes, where you can directly compare per-agent coverage against the multi-agent fusion under the same viewpoint.
We introduce ReplicaMultiAgent Plus, a multi-agent benchmark for collaborative Gaussian SLAM with diverse indoor scenes captured by multiple agents traversing overlapping regions. Beyond monocular (RGB-only) or RGB-D streams and ground-truth poses for each agent, we additionally release per-frame instance/semantic labels, which we hope will benefit research on multi-agent semantic SLAM, open-vocabulary 3D scene understanding, and collaborative multi-agent perception. Click any thumbnail to play the corresponding sequence.