We study a pursuit-evasion game between two players with car-like dynamics and sensing limitations by formalizing it as a partially observable stochastic zero-sum game. The partial observability caused by the sensing constraints is particularly challenging. As an example, in a situation where the agents have no visibility of each other, they would need to extract information from their sensor coverage history to reason about potential locations of their opponents. However, keeping historical information greatly increases the size of the state space. To mitigate the challenges encountered with such partially observable problems, we develop a new learning-based method that encodes historical information to a belief state and uses it to generate agent actions. Through experiments we show that the learned strategies improve over existing multiagent RL baselines by up to 9% in terms of capture rate for the pursuer. Additionally, we present experimental results showing that learned belief states are strong state estimators for extending existing game theory solvers and demonstrate our method’s competitiveness for problems where existing fully observable game theory solvers are computationally feasible. Finally, we deploy the learned policies on physical robots for a game between the F1TENTH and JetRacer platforms moving as fast as 2 m/s in indoor environments, showing that they can be executed on real-robots.

Paper video