MPII Human Pose Models

Introduction

This work considers the task of articulated human pose estimation of multiple people in real world images. We propose an approach that jointly solves the tasks of detection and pose estimation: it infers the number of persons in a scene, identifies occluded body parts, and disambiguates body parts between people in close proximity of each other. This joint formulation is in contrast to previous strategies, that address the problem by first detecting people and subsequently estimating their body pose. We propose a partitioning and labeling formulation of a set of body-part hypotheses generated with CNN-based part detectors. Our formulation, an instance of an integer linear program, implicitly performs non-maximum suppression on the set of part candidates and groups them to form configurations of body parts respecting geometric and appearance constraints. Experiments on four different datasets demonstrate state-of-the-art results for both single person and multi person pose estimation.

Citing

@inproceedings{pishchulin16cvpr,
               author = {Leonid Pishchulin and Eldar Insafutdinov and Siyu Tang and Bjoern Andres and Mykhaylo Andriluka and Peter Gehler and Bernt Schiele}
               title = {DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation},
               booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
               year = {2016},
               month = {June}
}
@inproceedings{insafutdinov16ariv,
               author = {Eldar Insafutdinov and Leonid Pishchulin and Bjoern Andres and Mykhaylo Andriluka and Bernt Schiele}
               title = {DeeperCut:  A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model},
               booktitle = {European Conference on Computer Vision (ECCV)},
               year = {2016},
               month = {May}
}

Publications


CVPR'16 paper		CVPR'16 poster		ECCV'16

Overview

Starting from a single monocular image of multiple individuals, a sparse set of body part detection candidates is computed (I). In order to incorporate various types of interactions between body parts within and across human bodies, a densely connected graph is constructed (II). The problem of multi-person pose estimation is then treated as integer linear program (ILP). Solution results into simultaneous partitioning of part detection candidates into person clusters and labeling each detection by one of the part classes (III), thus computing joint pose estimation of multiple people (IV).

DeeperCut improves over DeepCut on three fronts:

deeper ResNet architectures to enhance body part detectors to generate effective bottom-up proposals for body parts
novel image-conditioned pairwise terms allow to assemble the proposals into a variable number of consistent body part configurations
an incremental optimization strategy explores the search space more efficiently thus leading both to better performance and significant speed-up

Qualitative Results




fully connected graph	joint partitioning and labeling	jointly estimated poses

Quantitative Results

DeeperCut significantly outperforms best known multi-person pose estimation results and demonstrates competitive performance on the task of single person pose estimation.

For results and comparisons refer to MPII Human Pose Dataset web page.

Source Code

Multi-person and single person pose estimation code and MPII-pre-trained models can be downloaded from GitHub:

Related Resources

MPII Human Pose Dataset

MPII Human Shape

Poselet Conditioned Pictorial Structures

Team


Leonid Pishchulin	Eldar Insafutdinov	Siyu Tang	Bjoern Andres	Mykhaylo Andriluka	Peter Gehler	Bernt Schiele

If you have any questions, send an email to leonid at mpi-inf.mpg.de and eldar at mpi-inf.mpg.de

References

DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation

Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model

Eldar Insafutdinov, Leonid Pishchulin, Bjoern Andres, Mykhaylo Andriluka, Bernt Schiele

European Conference on Computer Vision (ECCV), 2016