Object Detection vs Semantic Segmentation

Object detection and semantic segmentation are both ways using which you can identify objects in an image. Well okay, it’s not that trivial. Let me dig into it a bit more. Let’s start by defining what we mean by -”identifying an object” in an image.

An Image is a collection of pixels. Typically, each pixel is a multidimensional array of numbers between 0-255. Suppose I have an image of a bird-


Figure 1 [1]

The image above is a combination of three channels. The red and green and blue channels. So when we talk about isolating an object in an image, there are two ways we can solve this problem.

One is to Isolate the object and put a bounding box around it. This is (I am simplifying it a bit) called object detection. The other way is to identify pixels in the image that belong to that object.

So in the image above, how would this look like –

seg_vs_obj_exampleAsset 1@4xFigure 2

In Figure 2  both methods want to isolate the birdie in the picture. In the first image of figure 2, a bounding box around the birdie. This is how the result of an object detection algorithm will look like. In the second part of figure 2, we see a green mask over the body of the birdie, here we are going over all the pixel in the image and classifying if it belongs to the birdie class or not. Hence we will get a boundary which separates birdie from the background. Look at figure 3 for more details- 

segmentationAsset 3@4x.png

Figure 3

In figure 3 the first part of the image the object pixel being labeled as either birdie or not. Now the second image is more interesting. In the second image, we have a black and white image where the black pixels are all the pixels that do not belong to the birdie and the white pixels are the ones that belong to the birdie. This is important to understand since this forms the crucial difference between object detection and semantic segmentation. In object detection, you don’t get any pixel information, you just get a box around the object here you get pixel-wise information.

[1]  Under creative commons license -Wikipedia 


