We study a pursuit-evasion game between two
players with car-like dynamics and sensing limitations by formalizing
it as a partially observable stochastic zero-sum game.
The partial observability caused by the sensing constraints is
particularly challenging. As an example, in a situation where
the agents have no visibility of each other, they would need
to extract information from their sensor coverage history to
reason about potential locations of their opponents. However,
keeping historical information greatly increases the size of the
state space. To mitigate the challenges encountered with such
partially observable problems, we develop a new learning-based
method that encodes historical information to a belief state
and uses it to generate agent actions. Through experiments we
show that the learned strategies improve over existing multiagent
RL baselines by up to 9% in terms of capture rate
for the pursuer. Additionally, we present experimental results
showing that learned belief states are strong state estimators
for extending existing game theory solvers and demonstrate
our method’s competitiveness for problems where existing fully
observable game theory solvers are computationally feasible.
Finally, we deploy the learned policies on physical robots for a
game between the F1TENTH and JetRacer platforms moving as
fast as 2 m/s in indoor environments, showing that they can be
executed on real-robots.