ArtTrack: Articulated Multi-person Tracking in the Wild

Abstract

In this paper we propose an approach for articulated tracking of multiple people in unconstrained videos. Our starting point is a model that resembles existing architectures for single-frame pose estimation but is substantially faster. We achieve this in two ways: (1) by simplifying and sparsifying the body-part relationship graph and leveraging recent methods for faster inference, and (2) by offloading a substantial share of computation onto a feed-forward convolutional architecture that is able to detect and associate body joints of the same person even in clutter. We use this model to generate proposals for body joint locations and formulate articulated tracking as spatio-temporal grouping of such proposals. This allows to jointly solve the association problem for all people in the scene by propagating evidence from strong detections through time and enforcing constraints that each proposal can be assigned to one person only. We report results on a public “MPII Human Pose” benchmark and on a new “MPII Video Pose” dataset of image sequences with multiple people. We demonstrate that our model achieves state-of-the-art results while using only a fraction of time and is able to leverage temporal information to improve state-of-the-art for crowded scenes.

Paper

BibTeX

@inproceedings{insafutdinov17arttrack,
  TITLE     = {{ArtTrack: Articulated Multi-person Tracking in the Wild}},
  AUTHOR    = {Insafutdinov, Eldar and Andriluka, Mykhaylo and Pishchulin, Leonid and Tang, Siyu and Levinkov, Evgeny and Andres, Bjoern and Schiele, Bernt},
  BOOKTITLE = {CVPR},
  YEAR      = {2017}
}