Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations
Paper
Graphical Model
- : the point of the joints(parts).
- : spatial relation between the joints(parts).
- , simply regard as -node tree.
- - locations : pixel location of each part .
- - types : a mixture of different spatial relationships. : set of spatial relations
Score Function
- Unary Term
IDPR term
Full score function:
Implementation
demo.m
conf
is a structure of the given global configuration. conf.pa
is the index of the parent of each joint. p_no
is the number of the parts(joints).
The main part of this function is shown in the following.
// read data
[pos_train, pos_val, pos_test, neg_train, neg_val, tsize] = LSP_data();
// train dcnn
train_dcnn(pos_train, pos_val, neg_train, tsize, caffe_solver_file);
// train graphical model
model = train_model(note, pos_val, neg_val, tsize);
// testing
boxes = test_model([note,'_LSP'], model, pos_test);
/* ... */
// evaluation
show_eval(pos_test, ests, conf, eval_method);
Read data : LSP_data.m
Some variables and constants:
trainval_frs_pos = 1:1000; // training frames for positive
test_frs_pos = 1001:2000; // testing frames for positive
trainval_frs_neg = 615:1832; // training frames for negative (of size 1218)
frs_pos = cat(2, trainval_frs_pos, test_frs_pos); // frames for negative
all_pos // num(pos)*1 struct array for positive
// struct: im, joints, r_degree, isflip
neg // num(neg)*1 struct array for negative
pos_trainval = all_pos(1 : numel(trainval_frs_pos)); // training and validation image struct for positive
pos_test = all_pos(numel(trainval_frs_pos)+1 : end); // testing image struct for positive
Data preparing:
lsp_pc2oc
:function joints = lsp_pc2oc(joints)
: convert to person-centricpos_trainval(ii).joints = Trans * pos_trainval(ii).joints;
Create ground truth joints for model training. Augment the original 14 joint positions with midpoints of joints, defining a total of 26 jointsadd_flip
: flip trainval images (horizontally) (#pos_trainval *= 2)init_scale
: init dataset specific parametersadd_rotate
: rotate trainval images (every $9^{\circ}$) (#pos_trainval *= 40)val_id = randperm(numel(pos_trainval), 2000);
: split training and validation data for positive (random choose 2000 image from thepos_trainval
to be the validation set, #training = #pos_trianval - 2000 = 78000)val_id = randperm(numel(neg), 500);
split training and validation data for negtive (random choose 500 image from theneg
to be the validation set, #neg_val = #neg - #neg_train = 1218 - 500 = 728)add_flip
: flip the negative data (#neg_val = 2; #neg_train = 2)
Train DCNN : train_dcnn.m
Some variable and constants:
mean_pixel = [128, 128, 128]; // the mean value of each pixel
K = conf.K; // K = T_{ij}
Prepare patches : prepare_patches.m
Prepare the patches and derive their labels to train dcnn
K-means : get , and the labels
// generate the labels
clusters = learn_clusters(pos_train, pos_val, tsize);
label_train = derive_labels('train', clusters, pos_train, tsize);
label_val = derive_labels('val', clusters, pos_val, tsize);
// labels for negative (dummy)
dummy_label = struct('mix_id', cell(numel(neg_train), 1), ...
'global_id', cell(numel(neg_train), 1));
// all the training data
train_imdata = cat(1, num2cell(pos_train), num2cell(neg_train));
train_labels = cat(1, num2cell(label_train), num2cell(dummy_label));
// random permute the data and store it in the format of LMDB
perm_idx = randperm(numel(train_imdata));
train_imdata = train_imdata(perm_idx);
train_labels = train_labels(perm_idx);
if ~exist([cachedir, 'LMDB_train'], 'dir')
store_patch(train_imdata, train_labels, psize, [cachedir, 'LMDB_train']);
end
// validation data for positive
val_imdata = num2cell(pos_val);
val_labels = num2cell(label_val);
if ~exist([cachedir, 'LMDB_val'], 'dir')
store_patch(val_imdata, val_labels, psize, [cachedir, 'LMDB_val']);
end
Learn clusters : learn_clusters
(call cluster_rp
cluster relative position)
nbh_IDs = get_IDs(pa, K);
: get the neighbor of each part(joint)clusters{ii}
: cell : the mean relative postion ofii
-th part- k-means
X(ii,:) = norm_rp(imdata(ii), cur, nbh, tsize);
relative position forii
-th data itemmean_X = mean(X(valid_idx,:),1);
normX = bsxfun(@minus, X(valid_idx,:), mean_X);
centralize (normalize) the relative position- Run
R
trials of the k-means algorithm and choose the one has the smallest distance[gInd{trial}, cen{trial}, sumdist(trial)] = k_means(normX, K);
calculate theimgid
(all the img belongs to the clusterk
) ofclusters{cur}{n}(k)
, whereclusters{cur}{n}(k)
is thek
-th cluster ofn
-th neighbor of thecur
-th joint.
Derive labels
Train dcnn
System call caffe
to train dcnn
system([caffe_root, '/build/tools/caffe train ', sprintf('-gpu %d -solver %s', ...
conf.device_id, caffe_solver_file)]);