Multi-View 3D Objection Detection Network for Autonomous Driving

Main Points:

Goal:
- Input: LIDAR point cloud + RGB Images;
- Predict: oriented 3D bounding boxes
Method:
- 3D object proposal generation: bird-view representation -> 3D candidate boxes
- multi-view feature fusion

3D Point Cloud Representation
- Bird-eye view LIDAR: height maps, density, intensity
  - 2D grid, dicretization resolution: 0.1m
  - M-slice height maps: maximum height of the points
  - density: normalized #points in one grid cell $min(1.0, \frac{log(N+1)}{log(64)})$
  - intensity: reflectance value of the point which has the maximum height in each cell (not slice)
- Front-view LIDAR: height, distance, intensity
  - 3D point $(x, y, z)$
  - $p_{fv} = (r, c)$ ,
  - $c = \lfloor atan2 ~ (y, x) ~ / ~ \Delta \theta \rfloor$
  - $r = \lfloor atan2 ~ (z, \sqrt{x^2+y^2}) ~ / ~ \Delta \phi \rfloor$
  - $\Delta \theta$ and $\Delta \phi$ : horizontal and vertical resolution
3D Proposal Network
1. Input: bird's eye view map
2. parameters for one 3D box (x, y, z, l, w, h)
s

Network architecture TODO!!!!!!!!!!!!!
Input Representaion
- use front-view point cloud: [0, 70.4] x [-40, 40] meters (remove points out of image boundaries)
- bird-eye view: discretization resolution: 0.1m -> input size: 704 x 800
- 64-beam Velofyne : 64 x 512 front view points ????
- RGB upscale -> shortest size is 500
Training ans testing procedure
- end-to-end
- mini-batch size : 1 , sample 128 ROIs (roughly keep 25% ROIs are positive)
- SGD, lr=0.001, #iterations=100K => reduce => lr = 0.0001, #iterations=20K
- Anchor: car detection: (l, w) $\in$ {(3.9, 1.6), (1.0, 0.6), (1.6, 3.9), (0.6, 1.0)}, h = 1.56
- Network Architecture: 3 pooling layer, no 4th pooling; 2x deconvolution
- IoU overlap during training: positive anchors > 0.7; negative anchors < 0.5
- empty anchors: computer an integral image over the point occupancy map
- for non-empty anchor: nms: nms on bv boxes; not use 3D mms; IoU thresh 0.7 for nms; top 2000 boxes for training; top 300 for testing
Imageset Split
- splits data in its own way: roughly half training and half validation
- follow KITTI difficult regime: easy, moderate, hard
Evaluation
- TODO !!!!!!!!!!!!!!!