Caltech
Center for Neuromorphic Systems Engineering

Home
Research
News
People

[back]

Spike Based Saliency Detection
Ulrik Beierholm, Pietro Perona

Abstract. Trying to quickly ascertain which parts of a visual scene is most relevant for a recognition task and then focusing on each of these areas, is an economical use of processing power known to be employed in the human visual system. Most models for saliency detection however are too slow to explain the performance of the biological system. We are currently working on implementing a fast neuronal spike based saliency detector model based on rank order coding.

Main. The human visual system is constantly presented with an astonishing amount of information, estimated to be on the order of 108 bits pr. second. In order to have a remote chance of processing all this information, a strategy of selection is employed, focusing on select areas of a picture. These areas are selected in a first initial processing according to what is more ‘salient’, selected from criteria such as luminance contrast, color contrast etc. This focusing of attention allows the visual system to apply the most processing power to these areas where it is more likely to find interesting structures.

Several biological models have been suggested for how this ‘saliency detection’ is done, most commonly employing a type of saliency map to indicate the most salient locations in the presented image (Itti, Koch & Niebur 1998, Li 2002). However all models require feedback connections or iterations, and a processing time of several hundred milliseconds in biological time. This seems unlikely considering the general speed of the visual system, f. ex., studies indicate that humans are able to recognize complex objects within 150 ms (Thorpe, Fize, Marlot 1996). This leaves no room for any recurrent connections or iterations, all processing has to be performed in one fast wave of neuronal spikes traveling through the nervous system. A saliency detection system would therefore have to use only feed forward connections and a neuronal coding scheme which would be able to process the information fast.

We are currently developing a model of such a system utilizing the rank order coding devised of Simon Thorpe’s group (VanRullen & Thorpe 2001). The model is built of several different layers each performing a specific part of the filtering process. The filters are constructed in accordance with the model of Itti et al. and designed to detect luminance contrast, colour contrast and orientation contrast. Each layer of processing contains a large number of neurons receiving inputs from previous layers. As inputs arrive to a single neuron, their strengths are attenuated so that the first input spike has the most importance and later spikes have decreasingly less strength. By further, employing weighting of each input line according to some desired pattern, a neuron can detect a specific sequence of input spikes.

When a neuron is sufficiently activated it fires an output spike, passed on to the next layer, creating a wave of spike traveling from one layer to the next, with a processing time of each layer of only a few milliseconds. Using such a coding scheme makes it possible for a signal to propagate very fast from layer to layer, and by only waiting for the first few spikes from a layer before processing the next layer, the most salient locations can be found with less than 1% of the information having propagated through the entire network.

A neural rank order coding makes the computation faster, and therefore more biologically plausible. It may have other advantages such as removing the need for normalization, preprocessing etc. We are currently in the process of investigating such prospects, along with improving the performance of the model.

Using a spike based detector also has implications for any attempts to implement a saliency detector in hardware, since the speed and parallel nature of the algorithm makes it possible to run a saliency detector in real time on simple analog components.

Figure 1. Original picture and a saliency mapping of the picture indicating the 5 locations the model predict as the most salient or ‘interesting’. Notice that Hollywood is really not that interesting.

References
Itti L, Koch C, Niebur E, A model of saliency-based visual attention for rapid scene analysis, IEEE T PATTERN ANAL 20 (11): 1254-1259 NOV 1998
Li ZP, A saliency map in primary visual cortex, TRENDS COGN SCI 6 (1): 9-16 JAN 2002
Van Rullen R, Thorpe SJ, Rate coding versus temporal order coding: What the retinal ganglion cells tell the visual cortex, NEURAL COMPUT 13 (6): 1255-1283 JUN 2001


top