Defining “Vision” in “Computer Vision”

Updated: Sep 20, 2022

Computer Vision also referred as Vision is the recent cutting edge field within computer science that deals with enabling computers, devices or machines, in general, to see, understand, interpret or manipulate what is being seen.

Computer Vision technology implements deep learning techniques and in few cases also employs Natural Language Processing techniques as a natural progression of steps to analyze extracted text from images.

With all the advancements of deep learning, building functions like image classification, object detection, tracking, and image manipulation has become more simpler and accurate thus leading way to exploring more complex autonomous applications like self-driving cars, humanoids or drones.

With deep learning, we can now manipulate images, for example superimpose Tom Cruise’s features onto another face. Or convert a picture into a sketch mode or water color painting mode. We can eliminate the background noise of a picture and highlight the subject in focus or even with most shaky hands a stable photograph can be clicked. We can estimate the closeness of, structure and shape of objects, and estimate the textures of a surface too. With different lights or camera exposure, we can identify objects and recognize an object that we have seen before.

In Computer Vision, by saying “enabling computers see”, we mean enabling machine or devices to process digital visual data that can range from images taken from traditional cameras to a graphical representation of a location, videos, and a heat intensity map of any data and beyond.

With the above elaboration of definition, we can see Computer Vision applications becoming ubiquitous in our day-to-day life. We can now think of finding an object or a face in a video and this can happen in a live video feed, understand motion and patterns within a video, increase decrease, size, brightness or sharpness of an image.