Stepping through the works of Marvin Minsky, David Marr and many others, Computer Vision has incrementally grown since its inception in the late 60's as an off-shoot of AI. Its ideas progressed from recognizing simple block problems to now making 3D scenes out of an MRI. Its not only remarked as cutting edge science, but also the technologies developed using it have slowly made places in our everyday lives. Think of all the digital cameras, now doing face detection to the grid of surveillance cameras looking for suspicious activities.
And that's why I have chosen Vision. It not only interests me, but since the field is in its youth, it gives tremendous opportunities to do ground-breaking research. Plus it provides frequent chances to visually boast your work. Not to mention that it has a higher probability that your grandma will understand what research you do everyday :)
I am also a prospective graduate student, currently looking for the right university. It should have a research group which is extremely strong in Vision (and maybe even in Systems research). So as part of the search I'll be looking at Vision's important conferences to rank universities active in Vision research. So keep a look out on this page; I might soon make my findings public. Any ideas on the ranking criteria are always welcome :)
Our research group primarily works on the retrieval and recognition of object activities from Video and Sensor Databases using motion trajectories. Currently I'm working on an off-shoot problem of this research:
In a crowd of people, tracking a single person for a human is usually trivial (given a good line of sight). Extending on this idea, given multiple observers you could possibly track all the people in the crowd. But for a machine, performing a similar task might be quite difficult. This is because, unlike the human brain, a machine processing a video cannot automatically infer temporal (over multiple frames) associations between different objects of interest. So this project explores ideas on how to effeciently compute those associations and generate reasonable tracks for all objects of interest.
I am currently working with Dr. Sohaib and Dr. Shahab, and we collaborate with Dr. Ashfaq Khokar at UIC. This research has been partly funded by the NSF.
Every human eye has a fovea, which is responsible for the sharp central vision. Consider of it as a region where the eye's "pixel resolution" is the highest. Moving away from the fovea, the eye resolution decreases, allowing the brain to receive more dense information where the eye is focused. Usually when watching a video, a human will concentrate more on the objects of interest with the help of his fovea, but unfortunately video encoding usually doesn't take advantage of this point of gaze. This research built on Giorgio Pioppo work which built a fast technique to encode foveation points in a video in order to save space while storing these videos. We researched on Computer Vision techniques to identify regions where foveation in videos can be advantageous while encoding. This can be really effective if these regions can be correctly identified according to where a viewer would look.
In our final year project, we developed AVRiL, a system for creating professionally directed lecture videos from multiple cameras at just a touch of a button. No camera crew, no direction crew required. For doing so, we exploited many ideas from Computer Vision. This included motion tracking, face recognition, and profile human body detection using Viola, Jones boosted cascade of features. Find out more at the AVRiL page, or if you're interested in how we built some visual intelligence for better direction, read the article on lecturer tracking
We had a group 4 (Yahya Cheema, Tayyab Javed, Ozair Muazzam, apart from myself) carrying out this project, instructed by Dr. Sohaib Khan. Because of this project, we were the first team ever to participate in the prestigious Microsoft Imagine Cup from Pakistan.
CMU Vision 16-720
Labs and Groups