Abstract.
We
foresee a future in which machines autonomously interact with Humans
in the surrounding environment. So far, very good results have been
achieved in detecting the presence of Humans and labeling their body
parts by means of graphical-models based algorithms. We unavoidably
have to deal with uncertainty and reasoning in absence of complete information.
To that extent, we explore and enhance the state of the art in probabilistic
inference and sampling techniques having the machines understanding
human actions as a primary application.
Motivation. It is becoming more and more a necessity for machines
to autonomously interact with Humans in the surrounding environment.
Detecting and interpreting human presence, actions and activities is
one of the most valuable functions of our own visual system. Endowing
machines with the same ability would enable a great number of useful
industrial applications ranging from convenient non-contact user interfaces
for consumer products, to on-board safety systems for automobiles, and
surveillance systems for stores and museums.
In order to interpret human activities a system must be able to detect
human presence. Further more, it is fundamental to localize the visible
parts of the body and characterize the corresponding regions of the
image (or label them). Once a labeling is achieved, the different parts
of the body may be tracked in time and their trajectories and/or spatiotemporal
energy patterns can be used in the classification of actions and activities.
So far, we primarily focused on detection and labeling, restricting
ourselves to a specific context known as the "Johansson problem"
(its generalization, in fact). More precisely, the position and velocity
of point-features are input to a system that decides whether human motion
is present. The system also assigns probabilistic labels to the detected
features. The method is shown to perform very well on both artificial
and real image sequences. We also address the problem of unsupervised
learning of the model structure.
Research. Our investigation in the field of graphical-models
and probabilistic inference has led to a powerful schema that is able
to learn a probabilistic model of the human body, describing the correlation
between the random variables that represent the position and motion
of each body part. To achieve invariance with respect to translation
we refer the data to a center of gravity of the body to be treated as
a hidden variable in a variant of the EM algorithm. A message-passing
algorithm determines the labeling based on the potentials of the clique
graph (or tree). We conducted experiments both on artificial data and
motion-captured image sequences. The results show that we can successfully
label the body with very high accuracy even when a substantial amount
of noise is present.
Moving one step further, we are considering the problem of describing
the dynamic of the body to infer the action that is being observed in
a probabilistic fashion by means of hybrid bayesian networks. Furthermore,
in order to work directly on grayscale images, we are investigating
ways of incorporating the data association problem directly into the
(dynamic) probabilistic model of the human body.
References
Fanti C, Polito M and Perona P - "An Improved Scheme for Detection
and Labeling in Johansson's Displays" - Submitted to NIPS 2003