Abstract.
Trying to quickly ascertain which parts of a visual scene is most relevant
for a recognition task and then focusing on each of these areas, is
an economical use of processing power known to be employed in the human
visual system. Most models for saliency detection however are too slow
to explain the performance of the biological system. We are currently
working on implementing a fast neuronal spike based saliency detector
model based on rank order coding.
Main. The human visual system is constantly presented with an
astonishing amount of information, estimated to be on the order of 108
bits pr. second. In order to have a remote chance of processing all
this information, a strategy of selection is employed, focusing on select
areas of a picture. These areas are selected in a first initial processing
according to what is more ‘salient’, selected from criteria
such as luminance contrast, color contrast etc. This focusing of attention
allows the visual system to apply the most processing power to these
areas where it is more likely to find interesting structures.
Several biological models have been suggested for how this ‘saliency
detection’ is done, most commonly employing a type of saliency
map to indicate the most salient locations in the presented image (Itti,
Koch & Niebur 1998, Li 2002). However all models require feedback
connections or iterations, and a processing time of several hundred
milliseconds in biological time. This seems unlikely considering the
general speed of the visual system, f. ex., studies indicate that humans
are able to recognize complex objects within 150 ms (Thorpe, Fize, Marlot
1996). This leaves no room for any recurrent connections or iterations,
all processing has to be performed in one fast wave of neuronal spikes
traveling through the nervous system. A saliency detection system would
therefore have to use only feed forward connections and a neuronal coding
scheme which would be able to process the information fast.
We are currently developing a model of such a system utilizing the rank
order coding devised of Simon Thorpe’s group (VanRullen &
Thorpe 2001). The model is built of several different layers each performing
a specific part of the filtering process. The filters are constructed
in accordance with the model of Itti et al. and designed to detect luminance
contrast, colour contrast and orientation contrast. Each layer of processing
contains a large number of neurons receiving inputs from previous layers.
As inputs arrive to a single neuron, their strengths are attenuated
so that the first input spike has the most importance and later spikes
have decreasingly less strength. By further, employing weighting of
each input line according to some desired pattern, a neuron can detect
a specific sequence of input spikes.
When a neuron is sufficiently activated it fires an output spike, passed
on to the next layer, creating a wave of spike traveling from one layer
to the next, with a processing time of each layer of only a few milliseconds.
Using such a coding scheme makes it possible for a signal to propagate
very fast from layer to layer, and by only waiting for the first few
spikes from a layer before processing the next layer, the most salient
locations can be found with less than 1% of the information having propagated
through the entire network.
A neural rank order coding makes the computation faster, and therefore
more biologically plausible. It may have other advantages such as removing
the need for normalization, preprocessing etc. We are currently in the
process of investigating such prospects, along with improving the performance
of the model.
Using a spike based detector also has implications for any attempts
to implement a saliency detector in hardware, since the speed and parallel
nature of the algorithm makes it possible to run a saliency detector
in real time on simple analog components.