LoopGNN

We introduce LoopGNN, a GNN that estimates loop closure consensus by leveraging sets of similar keyframes retrieved through visual place recognition (VPR). This increases the robustness of predicted loop closures and allows for a considerable effiency boost over classical VPR + RANSAC geometric verification pipelines.

Visual loop closure detection traditionally relies on place recognition methods to retrieve candidate loops that are validated using computationally expensive RANSAC-based geometric verification. As false positive loop closures significantly degrade downstream pose graph estimates, verifying a large number of candidates in online simultaneous localization and mapping scenarios is constrained by limited time and compute resources. While most deep loop closure detection approaches only operate on pairs of keyframes, we relax this constraint by considering neighborhoods of multiple keyframes when detecting loops. In this work, we introduce LoopGNN, a graph neural network architecture that estimates loop closure consensus by leveraging cliques of visually similar keyframes retrieved through place recognition. By propagating deep feature encodings among nodes of the clique, our method yields high precision estimates while maintaining high recall. Extensive experimental evaluations on the TartanDrive 2.0 and NCLT datasets demonstrate that LoopGNN outperforms traditional baselines. Additionally, an ablation study across various keypoint extractors demonstrates that our method is robust, regardless of the type of deep feature encodings used, and exhibits higher computational efficiency compared to classical geometric verification baselines.

Approach

LoopGNN approach: We create keyframes from robot trajectories and utilize a deep keypoint extractor such as XFeat [25] to obtain keypoints for each image. Next, we fit a VLAD-based place recognition model allowing robust and fast retrieval of similar frames given a query frame (left). In the following, given a query frame, we independently encode the keypoint descriptors of all frames (query and retrieved ones) using a NetVLAD layer and construct a neighborhood graph. We feed this attributed graph into a graph attention network in order to produce a deep consensus regarding loop closures among keyframes of the neighborhood (middle). Finally, we extract the set of highest-scoring edge-wise predictions of the network and validate pairs of frames using RANSAC-based geometric verification (right).

Authors

LoopGNN

Visual Loop Closure Detection Through Deep Graph Consensus

We release additional details and dataset-related information upon acceptance.

Approach

Code

Authors

Martin Büchner

Liza Dahiya

Simon Dorer

Vipul Ramtekkar

Kenji Nishimiya

Daniele Cattaneo

Abhinav Valada

Acknowledgment