Computer Vision Fundamentals - Autumn 2002
Intructor:
Dr Sohaib A. Khan sohaib at lums dot edu dot pk http://web.lums.edu.pk/~sohaib |
Office
hrs: Mon 930 am - 11 am Wed 930 am - 11 am Fri 11am - 1 pm |
This course is designed to give a broad overview of the fundamentals of computer vision, laying the foundations for advanced graduate classes in vision. This course will be conducted with an application perspective. Therefore students will be expected to implement several techniques learnt in the lectures. A good calculus, discrete mathematics and programming background is expected for this class. Knowledge of probability and random processes is a plus, but will not be assumed. This class is designed for senior-level and introductory graduate-level students of Computer Science.
This class will not overlap significantly with the Digital Image Processing class offered last quarter, though students who have taken DIP will have an advantage in the beginning.
Textbook:
Computer Vision
Linda G. Shapiro, George C. Stockman
ISBN 0-13-030796-3
Prentice Hall, 2001
Posted 03/09/02
Lecture 1:
Introduction.
[Download slides].
Material in these slides (including images, data etc.) must
not be used for commercial benefit without instructor's written permission.
Chapter 1 reading material is available from the website of one of the authors. Click here. Sec 2.5 is not available online, so reading assignment is reduced to Ch 1 only.
Useful links
dicussed in class:
http://www.cs.ucf.edu/~vision/projects/projects.html
http://www.wearcam.org, in particular,
look at research, and video
orbits. The Claire sequence shown in class is from this
page. Other mosaics results were from Cen
Rao's webpage at UCF, who has implemented the technique of Bergen et.
al.
MATLAB's Primer is a very good introduction to this environment and should be used by students who are not familiar with MATLAB.
Program 0:
PGM and PPM formats: Submission deadline - 10 Sept 2002
One way to view PPM and PGM files on your computer is to download free
imageviewer called IrfanView.
It also supports conversions between several other formats and is a very useful
tool for the type of work that we will do in this class.
See class notes (slides) for program details.
Test images: mecca06.pgm, mecca06.ppm
Posted 06/09/02
Lecture 2: Continuation of Introduction, and Getting Started with MATLAB - Download slides.
For more details of the video segmentation project (with examples from Larry King Live), visit this webpage. A related project from the same group is on movie genre classification from previews.
For the Orlando Police Dept project mentioned in the lecture, visit the OPD surveillance project website. This contains several interesting videos which demonstrate tracking output in real-time.
Transcript of the help session on MATLAB (2nd half of the lecture) is also available. Click here for the sample m-file function demonstrated in class. Though I did not mention this in class, but the input to this function can be a matrix of any size (I demonstrated it only with a scalar number). Working through the MATLAB primer is highly recommended for everyone.
Some students have asked me questions about Program 0. Here is what you need to do:
Posted 10/09/02
Lecture 3: 2D Imaging Transformations - Download slides.
More details of the video geo-registration project are available at the project website.
Reading assignment 2.1-2.5
Program 0 can be submitted on a floppy disk, as a hard copy, or on the common drive at J:\Cs436\Program0
Posted 12/09/02
Lecture 4: Displacement Models - Download slides
Wearable computing links - Steve Mann, Thad Starner, paper referred to in class on transformations is available online.
Program 1: 2D Affine Warping: Due Date: Thursday 19th Sept. - For details, click here. Data mecca06.pgm, mecca06t.pgm
Note that there are no office hours now on Thursday. Instead, an additional hour at Fri 11:00 - 12:00 noon has been added.
Posted 18-09-02
Lecture 5: 3D Transformations - Download slides
End of Module 1.
Notice change in office hours (There are now 5 hours per week. Try to utilize them effectively)
Program 1 is due this Thursday (to be submitted electronically and demonstrated in the lab).
Homework 1 has been assigned (see lecture slides). Due next Thursday
Posted 24-09-02
Lecture 6 and Lecture 7: Binary Image Analysis - Edge Detection - Download slides
Homework 1 (written assignment) is due on Thursday
Posted 26-09-02
Lecture 8: Edge Detection - Download Slides
Program 2: Canny's
Edge Detector. Due Monday 07-10-02 by 12:00 noon
Submit at J:\CS436\Program2
Please name your folder with your student id number (no-hyphens) so they appear
in the same order as class listing. Any comments in the folder name, eg. resubmission
or part2 etc, if necessary, should be after the student id number.
Posted 03-10-02
Lecture 9 - Binary
Shape Analysis 1 - Download slides
Lecture 10 - Binary Shape Analysis 2 - Download
slides
Program 2 - Requirements
and Deliverables
Test Images - mecca06.pgm, rice.pgm,
stc1.pgm, egg.pgm, cone.pgm
Hough Transform:
A very nice java demo and description is available here.
For an interesting example application of HT and associated paper, click here.
Solutions:
Homework 1 - Download
Quiz 1 - Download
Additional office hours on Tuesday morning before exam
Posted 11-10-02
Lecture 11 (After midterm): Binary Morphology - Download slides
Program 3 Hough
Transform for Lines: For handout click here.
Test Images:
sqr1-original, sqr1-canny,
sqr1-canny with noise, sqr1-canny
with breaks
ucfCsBldg-original, ucfCsBldg-canny,
orldntwn-original, orldntwn-canny,
angel-original, angel-canny
Program 3: For implementation help and intermediate/final outputs, click here.
The report that you write with final submission should include plenty of figures (as in my document above), The report should contain interesting implementation details as well as a good discussion on intermediate and final outputs for several examples. You should also comment on why a particular output makes sense or not, and what happens if some parameters are changed.
Canny's Implementation This webpage should clarify confusions in implementation of Canny, in particular regarding Non-Maxima Suppression, in which there were a lot of problems in the submitted homework. The document is written in the format of a sample report, to give you an idea of what is required in the report. Note that the algorithm is not explained in detail, but implementational details and discussion of output forms the bulk of the report.
Solution
to Quiz 2
Homework 2 Due Tuesday 29th October, in class
Lecture 13: Morphology,
Connected Components- Download slides
We also covered region properties like area, centroid, bounding box, circularity,
2nd moments, which should be covered from Sec 3.6 of text.
Lecture 14: Corrected
Morphology slides - Download slides
In this lecture we also covered subtraction, which should be followed from
text (section 9.1, 9.2). I also showed several examples of subtraction in
class.
Combined handout
for these three lectures (no need to print, will hand out photocopies
of this in class)
Lecture 15: Brightness
constancy constraint
Lecture 16: Pyramids (slides), Lucas-Kanade
Lecture 17: Global Motion
Program 4 (Global Motion) is worth 20 points. For this program, you need Warping, Reduce operation, and fx, fy, ft derivatives (as in the recent lectures). Complete details will be announced in Lecture 17, but everyone should get these modules working, if they haven't already done so. This program may be attempted in MATLAB, but that will require recoding of some of the work you have already done. Also, MATLAB implementation may be painfully slow, if you do not optimize your code. It is upto everyone to decide on their own whether they want to attempt in MATLAB or not. I will provide the solution in MATLAB, once the submission deadline has passed.
Test Images:
penny.pgm, penny-translation.pgm,
penny-scaling.pgm, penny-rotation.pgm
Transformations between these images are known, because of the way I generated
them, using warping.
Transformation
between 1 0 10 |
Transformation
between 1.2 0 -10 |
Transformation
between cos(20)
-sin(20) 20 |
Since you already
have a warping module, you can use any image, warp it according to some known
transformation, and then recover the transformation to check your code.
Matrix Inversion: If you choose to implement in C/C++, you will need matrix
inversion code. I am posting a set of routines from a book. I have written
an example help file, which illustrates how the routines can be used to solve
a system of the form Ax=b, (actually without computing the inverse, using
LU-Decomposition). Click here to download example
project. Or, you may search the net and find another option.
Test images to check your derivative masks: [a1.pgm, a2.pgm]. a2 image is translated version of a1 by (1,1), so fx * 1 + fy * 1 + ft should give zeros at most locations (except for four corners). These images can be used (without pyramids) to check your masks.
Simple Test data for bonus part [claire_0_5.zip] consists of 6 images, so you will have to recover 5 transformations, and warp each image back to the original location. There are two options after that, either you can blend them or paste them over one another. The results of each are [claire_0_5_blend.pgm, claire_0_5_paste.pgm]. Blending is more accurate for identifying errors, as errors are only visible on the edges of pasted images. However, the overall output looks nicer using paste-overs.
I will post longer test sequences later. You can generate your own test sequences out of large images using your warping function.
Help with Program4:
Intermediate
outputs:
You can use images a1.pgm and a2.pgm at only one level to test your program.
For this image pair, (u=1, v=1) for all pixels. If the images are thresholded
such that the black region is zero, white region is 1, then the values of
A (6x6 matrix) and B (6x1 matrix) are:
A | B | ||||||
25452 20370 1848 -1168 -526 -88 |
20370 21932 1848 -526 -1168 -88 |
1848 1848 168 -88 -88 -8 |
-1168
-526 -88 21932 20370 1848 |
-526
-1168 -88 20370 25452 1848 |
-88
-88 -8 1848 1848 168 |
1760 1760 160 1760 1760 160 |
If you use the images as they are, then the values will be 255^2 times the matrices above (why?).
The result of inv(A)*B for the above matrices (no matter whether you use 0-255 or 0-1 range) is
-1.7347e-016
-4.996e-016
1
0
0
1
Depending on the routine you are using for solving linear systems, the answer may differ slightly, but only after several places of decimal. For example, with the LUDCMP routine that I posted earlier on the website, the results are:
-1.12409e-009
1.11147e-008
1.00000
-3.15264e-009
2.41818e-008
1.00000
More test images:
You should try your program on the images given for Program 1 mecca06.pgm
and mecca06t.pgm, and report the transformation.
Also recover the transformation from mecca06t.pgm to mecca06.pgm and see if
they are the inverse of each other. Try on at least one image of your own,
by first warping the image according to some known transformation and then
recovering the transformation and report on error. Also, report your transformation
for the any pair of two images of the claire sequence (zip file of 6 images
is given above).
Longer test sequence
for bonus part:
lums-test.zip. Note that this is in color so you will have to recover
transformation from a gray version of this (use irfan view or write your own),
and then apply it to all three layers to generate color mosaic.
Lectures 18-20 cover pattern recognition concepts dicussed in Chapter 4 and 10 of the text. In addition, fuzzy K-Means and the idea of EM algorithm was presented for clustering.Bayes classifier slides are available [here]. Example applications were also discussed [download]. Webpages for these applications are linked from the following webpage: http://www.cs.ucf.edu/~vision/projects/projects.html. For all the projects discussed in class, you should know the input/output relationship and the features in the classifiers for each problem.
For a detailed list of topics covered in the entire course, see this webpage