| [back]
Object
Categorization: Unsupervised One-Shot Learning
Fei-Fei Li, Rob Fergus, Pietro Perona
Abstract.
Learning visual models of object categories notoriously requires thousands
of training examples; this is due to the diversity and richness of object
appearance which requires models containing hundreds of parameters.
We present a method for learning object categories from just a few images
(1 - 5). It is based on incorporating "generic'' knowledge which
may be obtained from previously learnt models of unrelated categories.
We operate in a variational Bayesian framework: object categories are
represented by probabilistic models, and "prior'' knowledge is
represented as a probability density function on the parameters of these
models. The "posterior'' model for an object category is obtained
by updating the prior in the light of one or more observations. Our
ideas are demonstrated on four diverse categories (human faces, airplanes,
motorcycles, spotted cats). Initially three categories are learnt from
hundreds of training examples, and a "prior'' is estimated from
these. Then the model of the fourth category is learnt from 1 to 5 training
examples, and is used for detecting new exemplars a set of test images.
Motivation. It is believed that humans can recognize between
5,000 and 30,000 object categories. Informal observation tells us that
learning a new category is both fast and easy, sometimes requiring very
few training examples: given 2 or 3 images of an animal you have never
seen before, you can usually recognize it reliably later on. This is
to be contrasted with the state of the art in computer vision, where
learning a new category typically requires thousands, if not tens of
thousands, of training images. These have to be collected, and sometimes
manually segmented and aligned -- a tedious and expensive task.
Computer vision researchers are neither being lazy nor unreasonable.
The appearance of objects is diverse and complex. Models that are able
to represent categories as diverse as frogs, skateboards, cell-phones,
shoes and mushrooms need to incorporate hundreds, if not thousands of
parameters. A well-known rule-of-thumb says that the number of training
examples has to be 5 to 10 times the number of object parameters—hence
the large training sets. The penalty for using small training sets is
over fitting: while in-sample performance may be excellent, generalization
to new examples is terrible. As a consequence, current systems are impractical
where real-time user interaction is required, e.g. searching an image
database. By contrast, such ability is clearly demonstrated in learning
in humans. Does the human visual system violate what would appear to
be a fundamental limit of learning? Could computer vision algorithms
be similarly efficient? One possible explanation of human efficiency
is that when learning a new category we take advantage of prior experience.
While we may not have seen ocelots before, we have seen cats, dogs,
chairs, and, more importantly, the variability in their appearance,
gives us important information on what to expect in a new category.
This may allow us to learn new categories from few(er) training examples.
We explore this hypothesis in a Bayesian framework. Bayesian methods
allow us to incorporate prior information about objects into a “prior”
probability density function which is updated, when observations become
available, into a “posterior” to be used for recognition.
Bayesian methods are not new to computer vision; however, they have
not been applied to the task of learning models of object categories.
We use here “constellation” probabilistic models of object
categories, as developed by Burl et al. and improved by Weber et al.
and Fergus et al. While they maximized model likelihood to learn new
categories, we use variational Bayesian methods by incorporating “general”
knowledge of object categories. We show that our algorithm is able to
learn a new, unrelated category using one or a few training examples.
Results. Our experiments demonstrate the benefit of using prior
information in learning new object categories. The following figure
shows models learnt by the Bayesian One-Shot algorithm on one of the
four datasets. It is important to notice that the “priors”
alone are not sufficient for object categorization (Panel (a)). But
by incorporating this general knowledge into the training data, the
algorithm is capable of learning a sensible model with even 1 training
example. For instance, in Panel (c), we see that the 4-part model has
captured the essence of a face (e.g. eyes and nose). In this case it
achieves a recognition rate as high as 82%, given only 1 training example.
Our algorithm has significantly faster learning speed due to much smaller
number of training examples.
|