Pose estimation



What is Pose Estimation?


Pose estimation is the task of estimating the pose of an object. The pose of an object is a collection of coordinates and orientation of the object in the camera’s view. Think of it as the computer being able to recognize not just the object in an image but the orientation and the location of the object within the image. Take, for example, a person walking on a street, if we take an image of this person, the computer must be able to tell what is the orientation of the person’s head or legs or other body parts. Is the person bending down? Crouching?. There are different poses that the person can take and the computer must be able to tell the difference between them.

Each pose can be represented as a cloud of points.  In case of a human being. the joints of our arms and legs can be points and a collection of these points will represent the pose of the human body. By combining these points we can form a skeleton structure that represents the pose of a human. The ml fact figure represents two examples of poses where the red dots are the pose points and the joints are represented by the lines connecting the pose points.


The Methods


The challenge that every method for pose estimation must face is how do identify a 3d object in a 2d image? With this, let’s look at two approaches that we can take to estimate pose.


The first approach involves using geometric methods to estimate the pose of an object [1]. For example, if we want the pose of an object we can extract various features of that object and then match them with the geometric shape of the object. Using this we can estimate the pose of the object. Of course, for this, we need to know about the geometry of the object in general. This method works well for simple object geometries but harder to do when the geometry of the object is not known.


The second method is to use deep learning and train a pose detector. See for example poseNet[2] where they simultaneously learn both pose and orientation. In the PoseNet paper, they give examples of how camera relocalization can be learned. A followup method called DensePose goes one step further and teaches a computer how to not only identify poses, but also the size and structure of the objects [3]. DensePose has been trained on specialized dataset called the DensePose COCO dataset.


Another interesting example of pose detection is pose prediction through a wall. This is has been done CSAIL at MIT [4]. They use wireless signals to predict the pose of a person’s limbs through the wall. This is only possible because wireless signals tend to pass through walls.

Applications of pose estimation


So far we have briefly looked at various examples of how pose estimation can be done. Now let us look some possible examples of applications.


One really interesting pose estimation application is the work by Mill lab where they use pose estimation techniques to animate the monster mascot [5]. Similarly, we can apply pose estimation to many tasks in the animation industry. For example, animating facial expression, or capturing facial expressions and superimposing them on another person’s face. This was done with an algorithm called Face2Face [6]. Such a method has an immense amount of applications, for example in dubbing movies from one language to another. As the authors of the paper state, there are also immense applications in the AR and VR industry where pose detection plays an important role.


There are also possible medical applications of pose detection. An example is the work of Carampel et al [7] where they utilize pose detection to identify if a person has fallen down. Another is the work of Achilles et al called Patient MoCap[8] where they use pose to judge the pose of a patient in the bed. This has a lot of relevance for patient monitoring especially for patients who have epilepsy or for patients in ICU.



[1] https://docs.opencv.org/3.3.0/dc/d2c/tutorial_real_time_pose.html











Leave a Reply