|
[back]
Distributed
Learning in Swarm Systems
Ling Li, Alcherio
Martinoli, and Yaser
Abu-Mostafa
Abstract.
Distributed learning is the process by which autonomous agents learn
to find individual strategies that maximize the team performance. The
major challenge in designing a distributed learning algorithm is to
solve the credit assignment problem in a highly dynamic system where
each agent has only partial and noisy information about the global task.
We investigate several learning techniques in a concrete case study
in collective robotics (the stick pulling experiment). Our results based
on a faithful microscopic probabilistic model show that learning improves
both the performance and the adaptability of the swarm system, and an
initially homogeneous team usually becomes specialized after learning.
Motivation
and Aims. Natural systems consisting of many agents, such as ants,
wasps, and termites, exhibit complex behavior which appears to transcend
the abilities of the relatively simple constituent individuals. Artificial
swarm systems based on swarm intelligence consist of relatively simple
autonomous agents. They are truly distributed, self-organized, and inherently
scalable since there is no global control or communication mechanism.
The agents are designed to be simple and interchangeable, and may be
dynamically added or removed without explicit reorganization, making
the collective system highly flexible and fault tolerant.
When
applying rules extracted from natural systems to artificial problems,
the difference between the natural systems and artificial problems essentially
requires different control parameters to be used. Learning, as an automatic
way to adjust control parameters, is used to adapt rules to new problems
and to improve the performance. Learning also serves as a way to adapt
to a changing environment.
Research and Achievements. We investigate several learning issues
in swarm systems under a case study---the stick pulling experiment.
This is a strictly collaborative problem where collaboration between
two non-communicating robots is required to complete the task. Each
robot in the experiment is characterized by a gripping time parameter
(GTP), which is the maximal length of time that a robot waits for the
help of another robot while holding a stick. The goal of learning is
to find a proper GTP for every robot so that the team can pull up sticks
from the holes in the arena as quickly as possible. We base our experiments
on a probabilistic model which is faithful in simulating experiments
with real robots. In order to compare learned performances with optimal
solutions, we have performed a systematic search in the parameter space
and measured the optimal performances of homogeneous and heterogeneous
teams consisting of 2 to 6 robots.
By integrating learning ability into individual robots, the whole team
can adapt according to environmental changes and can maintain a near-optimal
performance (Fig. 1). We tested several learning algorithms, including
adaptive line search and Q-learning. We found that, for this case study,
learning algorithms which directly search for optimal parameters work
much better than those based on reward estimation.

Figure
1. The
performance (collaboration rate) with learning. Individual reinforcement
was used and heterogeneity was allowed. Robots were initially given
a gripping time. With learning, they adjusted their gripping time
and achieved a higher performance. Different colors represent experiments
with different number of robots. Error bars are standard deviations
of performance over 50 runs. Dashed curves are performance without
learning.
Compared
with the optimal performance obtained from the systematic search,
the learned performance is a bit lower on average (Fig. 2). We are
currently investigating several hypotheses why this happens. For instance,
the type of reinforcement and noise may influence the team performance
after learning. Our experiments show that, although learning cannot
lead to optimal performance, it does enhance adaptability and stability
of the whole team. As an untested hypothesis, we conjecture that any
learning model can only achieve a trade-off between optimality and
adaptability.

Figure
2. Comparison of performance with different reinforcement and
team diversity.
If
homogeneity is not enforced, an initially homogeneous team may specialize
through learning (Fig. 3). Our results show that policies allowing
specialization achieve in general similar or better performances than
policies forcing homogeneity (Fig. 2). We developed an ad hoc method
to measure the specialization, and found that specialization is growing
sub-linearly as a function of the number of robots.

Figure
3. During one simulation, 4 robots had 210s as the initial gripping
time, and they specialized at the end of the simulation.
Our
future work includes further study of the impact of noise on learned
solutions, and the measure of specialization as a function of task
constraints as well as the team size.
Rationale.
The
study of distributed learning in swarm systems enables us to design
more powerful and more robust swarm systems that can work under changing
environments and tasks which cannot be handled simply by collective
mechanisms (e.g. allocate different number of units to different tasks).
Furthermore, our work also helps understand the behavior of complex
systems consisting of many autonomous agents.
Publications/References
A. J. Ijspeert, A. Martinoli, A. Billard, and L. M. Gambardella.
Collaboration through the exploitation of local interactions in autonomous
collective robotics: the stick pulling experiment. Autonomous Robots,
11(2):149-171, 2001.
K. Lerman, A. Galstyan, A. Martinoli, and A. J. Ijspeert. A Macroscopic
Analytical Model of Collaboration in Distributed Robotic Systems.
Artificial Life, 7(4):375-393, 2001.
L. Li. Distributed Learning in Swarm Systems: A Case Study. M.S. thesis,
California Institute of Technology, Pasadena, CA, 2002.
L. Li, A. Martinoli, and Y. S. Abu-Mostafa. Emergent Specialization
in Swarm Systems. In H. Yin et al., eds., Intelligent Data Engineering
and Automated Learning - IDEAL 2002, vol. 2412 of Lecture Notes in
Computer Science, pp. 261-266. Springer-Verlag, Berlin, 2002.
top
|