Probabilistic Data Association for Semantic SLAM


  • Motivation: Allows vehicles to autonomously navigate in an environment without apriori knowledge of the map and without access to independent position information
  • Problem to solve:
    • (What the enviroment looks like & Where is the robots, After given a path, how does the robot go)
    • Sensing: How the robot measures properties of itself and its environment
  • What is SLAM?
    • simultaneous localization and mapping
    • wiki
      • Localization: inferring location given a map
      • Mapping: inferring a map given locations
    • SLAM is a chicken-or-egg problem:
      • a map is needed for localization
      • a pose estimate is needed for mapping
  • Classical solutions:
    • Landmark extraction
    • data association
    • State estimation
    • state update
    • landmark update
  • Data Association:
    • ascertaining which parts of one image correspond to which parts of another image, where differences are due to movement of the camera, the elapse of time, and/or movement of objects in the photos.
    • Problems:
      • You might not re-observe landmarks every time.
      • You might observe something as being a landmark but fail to ever see it again.
      • You might wrongly associate a landmark to a previously seen landmark.
  • Loop Closure
    • Loop closure is the problem of recognizing a previously visited location and updating the states accordingly.
  • Limitation of traditional approaches:
    • rely on low-level geometric features- > loop closure recognition based on low-level features is often viewpoint-dependent and subject to failure in ambiguous or repetitive environments
    • object recognition methods can infer landmark classes and scales, resulting in a small set of easily recognizable landmarks, ideal for view-independent unambiguous loop closure
  • Goal: address the metric and semantic SLAM problems jointly,
    • providing a meaningful interpretation of the scene,
    • semantically-labeled landmarks address two critical issues of geometric SLAM: data association (matching sensor observations to map landmarks) and loop closure (recognizing previously-visited locations).
  • Other approches:
    • filtering methods
    • batch methods: pose graph optimization, iterative optimization methods
    • use both spatial and semantic representation
      • For localization: incorporate semantic observations in the metric optimization
      • realtime implementation
      • global optimization for 3d reconstruction and semantic parsing. 3d space is voxelized
      • structure from motion
      • semantic mapping
  • Contributions:
    1. the first to tightly couple inertial, geometric, and semantic observations into a single optimization framework
    2. provide a formal decomposition of the joint metric-semantic SLAM problem into continuous (pose) and discrete (data association and semantic label) optimiza-tion sub-problems,
    3. carry experiments on several long-trajectory real indoor and outdoor datasets which include odometry and visual measurements in cluttered scenes and varying lighting conditions.


EM Algorithms

  • Wiki: link
  • In statistics, an expectation–maximization (EM) algorithm is an iterative method to find maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step
  • Coin flipping Examples: link em_coin

Probabilistic Data Association in SLAM

  • Formualtation of the problem
    • L\mathcal{L} \triangleq

Semantic SLAM


  • backend: GTSAM and its iSAM2 implementation, realtime
  • frontend:
    • every 15th camera frame as a keyframe
    • ORB features
    • outlier tracks: estimating the essential matrix using RANSAC
    • assume timeframe is short => oritation differences using gyroscope measurements => only need to estimate unit translation vector (only using two point correspondence)
    • object detector: deformable parts model detection algorithm
    • a new landmark: Mahalanobis distance to all known landmarks, above a threshold, initial position: camera ray with estimation of depth(median depth of all geometry feature measurements)
  • in practice, solve only once for the constraints weights
  • datasets:
    • experimental platform: VI-Sensor (IMU, left camera)
      1. medium length(~175m), one floor of office building, object classes(red office chairs and brown four legs chairs)
      2. long length(~625m), two different floors, object classes(red chairs and doors)
      3. several loops around one room(with vicon motion tracking system), object class(red chair)
    • KITTI (05, 06)
  • Results:

  • Future Work / Limitation:

    • estimate full pose of the semantic objects (orientation in addition to positions)
    • consider data associations for past keyframes
    • consider multiple sensors (only one camera?)
    • consider non-stationary objects


  1. Motivation: why come up this problem
  2. Problem: What problem is the paper trying to solve?
  3. Background: Give background so everyone is on the same page
  4. Topics in class: Connect ideas in paper to topics covered in class. Sensors: gyroscope(IMU), cameras
  5. Contributions: What is the contribution compared to prior work?
  6. New ideas: What can now be done that couldn’t be done before?
  7. New Methods / Approprate Tech Depth: What new ideas enable this to be done?
  8. Experiments & Results: What evaluation/experiments were performed?
  9. Limitation & Opinion for Effectiveness: Give your opinion/analysis of effectiveness of proposed method

results matching ""

    No results matching ""