# Probabilistic Data Association for Semantic SLAM

## Introduction

• Motivation: Allows vehicles to autonomously navigate in an environment without apriori knowledge of the map and without access to independent position information
• Problem to solve:
• (What the enviroment looks like & Where is the robots, After given a path, how does the robot go)
• Sensing: How the robot measures properties of itself and its environment
• What is SLAM?
• simultaneous localization and mapping
• wiki
• Localization: inferring location given a map
• Mapping: inferring a map given locations
• SLAM is a chicken-or-egg problem:
• a map is needed for localization
• a pose estimate is needed for mapping
• Classical solutions:
• Landmark extraction
• data association
• State estimation
• state update
• landmark update
• Data Association:
• ascertaining which parts of one image correspond to which parts of another image, where differences are due to movement of the camera, the elapse of time, and/or movement of objects in the photos.
• Problems:
• You might not re-observe landmarks every time.
• You might observe something as being a landmark but fail to ever see it again.
• You might wrongly associate a landmark to a previously seen landmark.
• Loop Closure
• Loop closure is the problem of recognizing a previously visited location and updating the states accordingly.
• rely on low-level geometric features- > loop closure recognition based on low-level features is often viewpoint-dependent and subject to failure in ambiguous or repetitive environments
• object recognition methods can infer landmark classes and scales, resulting in a small set of easily recognizable landmarks, ideal for view-independent unambiguous loop closure
• Goal: address the metric and semantic SLAM problems jointly,
• providing a meaningful interpretation of the scene,
• semantically-labeled landmarks address two critical issues of geometric SLAM: data association (matching sensor observations to map landmarks) and loop closure (recognizing previously-visited locations).
• Other approches:
• filtering methods
• batch methods: pose graph optimization, iterative optimization methods
• use both spatial and semantic representation
• For localization: incorporate semantic observations in the metric optimization
• realtime implementation
• global optimization for 3d reconstruction and semantic parsing. 3d space is voxelized
• structure from motion
• semantic mapping
• Contributions:
1. the first to tightly couple inertial, geometric, and semantic observations into a single optimization framework
2. provide a formal decomposition of the joint metric-semantic SLAM problem into continuous (pose) and discrete (data association and semantic label) optimiza-tion sub-problems,
3. carry experiments on several long-trajectory real indoor and outdoor datasets which include odometry and visual measurements in cluttered scenes and varying lighting conditions.

## Methods

### EM Algorithms

• In statistics, an expectation–maximization (EM) algorithm is an iterative method to find maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step

### Probabilistic Data Association in SLAM

• Formualtation of the problem
• $\mathcal{L} \triangleq$

## Experiments

• backend: GTSAM and its iSAM2 implementation, realtime
• frontend:
• every 15th camera frame as a keyframe
• ORB features
• outlier tracks: estimating the essential matrix using RANSAC
• assume timeframe is short => oritation differences using gyroscope measurements => only need to estimate unit translation vector (only using two point correspondence)
• object detector: deformable parts model detection algorithm
• a new landmark: Mahalanobis distance to all known landmarks, above a threshold, initial position: camera ray with estimation of depth(median depth of all geometry feature measurements)
• in practice, solve only once for the constraints weights
• datasets:
• experimental platform: VI-Sensor (IMU, left camera)
1. medium length(~175m), one floor of office building, object classes(red office chairs and brown four legs chairs)
2. long length(~625m), two different floors, object classes(red chairs and doors)
3. several loops around one room(with vicon motion tracking system), object class(red chair)
• KITTI (05, 06)
• Results:

• Future Work / Limitation:

• estimate full pose of the semantic objects (orientation in addition to positions)
• consider data associations for past keyframes
• consider multiple sensors (only one camera?)
• consider non-stationary objects

## Slides:

https://people.eecs.berkeley.edu/~jrs/speaking.html

1. Motivation: why come up this problem
2. Problem: What problem is the paper trying to solve?
3. Background: Give background so everyone is on the same page
4. Topics in class: Connect ideas in paper to topics covered in class. Sensors: gyroscope(IMU), cameras
5. Contributions: What is the contribution compared to prior work?
6. New ideas: What can now be done that couldn’t be done before?
7. New Methods / Approprate Tech Depth: What new ideas enable this to be done?
8. Experiments & Results: What evaluation/experiments were performed?
9. Limitation & Opinion for Effectiveness: Give your opinion/analysis of effectiveness of proposed method