Monocular Person Localization under Camera Ego-motion

1RCVLab, Southern University of Science and Technology

Abstract

Localizing a person from a moving monocular camera is critical for Human-Robot Interaction (HRI). To estimate the 3D human position from a 2D image, existing methods either depend on the geometric assumption of a fixed camera or use a position regression model trained on datasets containing little camera ego-motion. These methods are vulnerable to fierce camera ego-motion, resulting in inaccurate person localization. We consider person localization as a part of a pose estimation problem. By representing a human with a four-point model, our method jointly estimates the 2D camera attitude and the person's 3D location through optimization. Evaluations on both public datasets and real robot experiments show the robustness and accuracy of our method compared with existing person localization and pose estimation methods. Our method is further implemented into a person-following system and deployed on an agile quadruped robot.

MY ALT TEXT

The observation model of our method. To better illustrate the geometry, in (a), we fix the camera frame so a human appears tilted in the camera frame due to camera ego-motion. Once we assume the human is always upright and fix the human, an RPF scenario can be constructed in (b).

MY ALT TEXT

The proposed person localization method achieves stable results under camera ego-motion due to large rotation. Additionally, the Normalization module enables our framework to be compatible with different cameras such as pin-hole, fisheye and equirectangular images. By performing data association and trajectory smoothing in 3D space, we achieve robust person tracking and following in rough terrains.

Visualization Examples