Pakistan Research Repository Home

Title of Thesis

Author (s)
Humera Noor
Institute/University/Department Details
Department of Computer and Information Systems Engineering/ NED University of Engineering & Technology, Karachi
Computer Engineering & Information Technology 
Number of Pages
Keywords (Extracted from title, table of contents and abstract of thesis)
computer vision, computational economical, interpolation, extrapolation


This thesis targets Artificial Intelligence - a fundamental branch of Computer Engineering striving to provide human-like capabilities and intelligence to the computer systems. More specifically, it deals with computer vision, which has gained a lot of attention by researchers due to its wide applicability in day-to-day tasks involving view generation, synthesizing animations and videos from static images, surveillance, medical imaging, tracking, object recognition and classification etc. This thesis investigates the problem areas of image synthesis, object recognition and object categorization. The problem of generating images at novel, arbitrary and unconstrained viewpoints covering interpolation and extrapolation is investigated by operating on a sparse set of basis images of a real scene. This image generation methodology is further incorporated to develop models for object recognition and categorization.  First, an image synthesis strategy has been presented that generates virtual views at arbitrary points using interpolation and extrapolation from a sparse set of images. The traditional work on view synthesis using interpolation has been extended and it has been shown that view extrapolation can be done as easily as interpolation. Moreover, certain scenarios have been identified like planar and/or multi-planar scenes and pure rotational camera motion for image capture that allow direct retrieval of the underlying mapping function between the images and hence leading to even more simplified image extrapolation. The major issues and factors affecting the accuracy of generation have been explored and suggestions are presented to improve the virtual view quality. Next, an approach is presented to generate a model for multi-view object recognition. A view centered model is generated using either a video sequence or a sparse set of images captured around the object following arbitrary and unconstrained camera trajectory. It does not require any prior knowledge of camera parameters and positioning or motion of object and/or camera. The model thus generated is quite dense with a lot of redundant images. Thus the virtual view generation strategy is applied to identify the redundant images and remove them. This results in a model that is computationally economical in terms of space and time. Next, for testing or recognition, the model is used in conjunction with a video sequence which provides information of multiple views of the object and thus increases the confidence measure of results. The model is robust in that it captures the topological structure of the objects from multiple viewpoints allowing the use of a video iv sequence rather than a single test image for object recognition. No constraint has been placed on camera and/or object motion while capturing the video. Next, an approach for video-based multi-view object classification is presented. For each object instance of a particular category, a neighborhood graph-based model is generated using the set of input images which are arranged in a manner that highlights the underlying topological structure. Again, no constraint is placed on the motion and placement of the object and camera during image capture. Moreover no prior knowledge of positioning or parameters of camera is desired. The view synthesis algorithm is used to identify the redundant images in the model and remove them to give a computationally economical model in terms of space and training time. The independent graphs of the different instances of the object category are then merged by automatically identifying the corresponding viewpoints across them. The strength of this approach is that it allows object categorization from multiple viewpoints while eliminating the need of manual alignment of common viewing angles across object instances. Another strength is that the video sequences have been used for object classification, instead of images, which increases precision of results.

Download Full Thesis













9730 KB

S. No. Chapter Title of the Chapters Page Size (KB)
1 1 Introduction 1
  1.1 Motivation and Goals 4
  1.2 Contributions of this work 6
112 KB
  1.3 Dissertation Overview 8
2 2 Related Work 7
  2.1 Feature Extraction and Matching 10

124 KB


100 KB

  2.2 View Synthesis 16
  2.3 Summary 38
3 3 View Synthesis using Image Extrapolation 39
  3.1 Feature Extraction and Correspondence 43
  3.2 Issues in View Generation 51
  3.3 Summary 71
4 4 Using Virtual View Synthesis for Model Generation for Video-based Object Recognition 72
  4.1 Overview 72
  4.2 Model Generation 75

2081 KB

  4.3 Video-based Object Recognition 81
  4.4 Experiments and Results 83
  4.5 Summary 97
5 5 Multi-View Object Categorization Using Video 98
  5.1 Overview 99
  5.2 Model Generation 101
  5.3 Single-view Object Classification 114
  5.4 Video-based multi-view object categorization 116
  5.5 Experiments and Results 118
  5.6 Summary 134
27 KB


71 KB

6 6 Conclusion 135
7 7 References 183