image

Why Data Needs to be Labeled

In order for a self-driving car to “see,” “hear,” “understand,” “talk” and “think,” it needs video, image, audio, text, LIDAR, and other sensor data to be correctly collected, structured, and understood by
machine learning models.


Breaking this down to just what a car “sees” requires annotating many images so that
a model can learn and understand all the different street signs under all conditions.
While speed limit signs may have the same shape, the car must also interpret the
number on the sign to drive safely. A car must also be able to “understand” what a
person is – including an adult, a kid, and a baby, for example. To do this, pictures of
many different people must be shown from all different angles so that it can start to
say what is and is not a person.


To break it down further, a picture is simply a series of pixels to a machine. Those
pixels have values that correspond to colors but those pixels don’t have values
that represent the object - just a tiny dot on a massive canvas of other pixels. But
labeled images show machines that certain collections of pixels are certain objects.
Let’s go back to ImageNet. Every image in that dataset was labeled by a person. The
end result: thousands of examples of different objects. From those labels, machines
can make sense of the pixels of which they’re made up.


Now, image labeling can be done in many different ways. You can run rudimentary
labeling tasks like “is there a dog in this picture,” but it’s going to take a ton of images
for a machine to start to understand that dataset. It’s usually better practice to use
bounding boxes, dots, or to actually label an image pixel by pixel.
Generally speaking, the more examples a machine sees, the better it understands.
This usually holds true no matter the use case—images, text, audio, what have you.
The point is that the data you have likely isn’t the data you need to create effective
machine learning algorithms. It’s far more common that the data you have needs to
be labeled or annotated in some way, shape, or form so that a machine can actually
understand it. And the more labels a piece of data has, the more complicated an
ontology it can create.

Contact us today to know more about getting labeled datasets.