Social Robot Tree Search (SoRTS)

Published: Mar 21, 2024 by Ingrid Navarro

SoRTS: Learned Tree Search for Long-Horizon Social Robot Navigation

Ingrid Navarro *, Jay Patrikar *, Joao P. A. Dantas, Rohan Baijal, Ian Higgins, Sebastian Scherer and Jean Oh

*Denotes equal contribution

This work is a collaboration between the Bot Intelligence Group (BIG) and the AirLab at CMU!


The fast-growing demand for fully autonomous aerial operations in shared spaces necessitates developing trustworthy agents that can safely and seamlessly navigate in crowded, dynamic spaces. In this work, we propose Social Robot Tree Search (SoRTS), an algorithm for the safe navigation of mobile robots in social domains. SoRTS aims to augment existing socially-aware trajectory prediction policies with a Monte Carlo Tree Search planner for improved downstream navigation of mobile robots. To evaluate the performance of our method, we choose the use case of social navigation for general aviation. To aid this evaluation, within this work, we also introduce X-PlaneROS, a high-fidelity aerial simulator, to enable more research in full-scale aerial autonomy. By conducting a user study based on the assessments of 26 FAA-certified pilots, we show that SoRTS performs comparably to a competent human pilot, significantly outperforming our baseline algorithm. We further complement these results with self-play experiments, showcasing our algorithm’s behavior in scenarios with increasing complexity.

Video Overview


We present Social Robot Tree Search (SoRTS), an algorithm for safe navigation of mobile robots in social settings. SoRTS aims to augment existing data-driven socially-aware motion prediction models with a Monte Carlo Tree Search (MCTS) planner, enabling long-horizon social navigation. Although the algorithm is designed to be domain-independent, in our work we use general aviation in non-towered airports as our case of study.


SoRTS’ tree search is biased by social and a reference modules. The social module aims to handle short-horizon agent-to-agent dynamics. It does so by leveraging socially-aware motion prediction model which was trained offline on trajectory data.


The reference module provides a global path to the agent which embodies a navigation guideline (e.g., an FAA-established flying pattern) that the agent should stick to whenever there aren’t any social conflicts to resolve. This module also provides a cost map that represents a frequence count map to encourage the agent to visit most-frequent regions.


Finally, SoRTS’s combines the decision modalities of these two modules via Monte Carlo Tree Search (MCTS) to achieve long-horizon navigation by backpropagating most successful outcomes. The figure below shows a toy-example of two aircraft merging on a single path, where each can execute three possible motion primitives 1: forward, speed up, and slow down. MCTS, in its forward simulations, not only prunes branches that lead to future collisions but also uses the social module to choose between cutting-in-front (socially undesirable) and yielding (socially desirable), thereby producing socially compliant and safe behavior.



We support our method with two evaluation settings: a user study and self-play experiments.


We designed a within-subject study with 26 participants, all of which were FAA-certified. Using our flight simulator, XPlaneROS, we asked users to land an aircraft at a specified runway while interacting with a second pilot instructed to complete the same task. The task required both pilots to coordinate with each other, i.e., being socially-compliant and follow FAA flying patterns. As shown in the figure below, the second pilot was either a human, SoRTS or a baseline2 algorithm. In each trial, the algorithm3 order was randomly chosen.


The study aimed at analyzing the second pilot’s navigational performance (e.g, ability to follow rules, navigate smoothly, etc) and safety (e.g, comfort, collision risk, etc) by analyzing how the users perceived the other pilot, but also by studying the trajectory information from each trial.

Overall, we found that users rated SoRTS’ and the human’s competence similarly, while the baseline was perceived as less competent. The figure below shows examples of two experiments. Left: Each row shows the resulting trajectories of a user interacting with our human pilot, SoRTS, and the baseline. The top row shows a user that successfully followed the expected guideline; we can see that the baseline did not fly as smoothly as SoRTS, and abruptly cut short close to the goal. The bottom row shows a user that did not follow the expected guideline; our algorithm still managed to navigate properly, whereas the baseline behaved erratically, unsafely crossing over the runway. Right: Box-plots showing the per-algorithm distribution of the results from the user study: the top ones show the average scores given by the user for navigation performance (a) and safety (b). The bottom ones show the average reference error (a) and loss-of-separation (b) metrics obtained from the trajectory data.


Self-Play Experiments

Baseline vs. SoRTS

We also performed self-play experiments to further examine the behavior of our algorithm and its baseline. These experiments follow the same design as our user study but we also analyze experiments with 3 and 4 agents in each scenario. The figure below shows animated examples from our simulations. We also provide quantitative evaluations, performed on the resulting trajectories.

Baseline SoRTS
Num Agents Algorithms Success Rate Timeout Offtrack Reference Error
2 Baseline 0.760 0.190 0.050 19.34
  SoRTS 0.950 0.040 0.010 13.69
3 Baseline 0.747 0.218 0.067 19.34
  SoRTS 0.943 0.030 0.027 15.68
4 Baseline 0.483 0.510 0.007 19.34
  SoRTS 0.932 0.055 0.013 15.68

SoRTS conflict resolution

Here’s a simulation example showing each agent’s reference trajectories (transparent blue) and executed trajectories (blue), as well as their long-horizon rollouts (pink). In this example, we can see how the agent on the left preemptively avoids a social conflict, allowing the other agent to pass and land first. Once the social conflict is resolved, the agent resumes its task.



  1. In our method, we actually leverage a set of 252 motion primitives.
  2. The baseline algorithm uses the same social and reference components but combines them naively by weighting them together.
  3. We use second pilot and algorithm interchangeably.


  author={Navarro, Ingrid and Patrikar, Jay and Dantas, Joao P. A. and Baijal, Rohan and Higgins, Ian and Scherer, Sebastian and Oh, Jean},
  journal={IEEE Robotics and Automation Letters}, 
  title={SoRTS: Learned Tree Search for Long Horizon Social Robot Navigation}, 
  keywords={Navigation;Robots;Social robots;Predictive models;Behavioral sciences;Monte Carlo methods;Costs;Aerial Systems: Perception and Autonomy;human-aware motion planning;safety in HRI},