COMPASS

Cooperative Multi-Agent Planning with Adaptive Skill Synthesis

COMPASS

Cooperative Multi-Agent Planning with Adaptive Skill Synthesis

Affiliation

We present COMPASS, a novel multi-agent architecture that integrates vision-language models (VLMs) with a dynamic skill library and structured communication for decentralized closed-loop decision-making.

Overview of the COMPASS architecture, a novel framework that advances cooperative multi-agent decision-making through three synergistic components: (1) A VLM-based closed-loop planner that enables decentralized control by continuously processing multi-modal feedback and adapting strategies, addressing the non-Markovian challenge of multi-agent systems; (2) A dynamic skill synthesis mechanism that combines demonstration bootstrapping with incremental skill generation, improving sample efficiency and interpretability; and (3) A structured communication protocol that facilitates efficient information sharing through entity-based multi-hop propagation, enhancing cooperative perception under partial observability.

Adaptive Skill Synthesis

Overview of Adaptive Skill Synthesis. VLMs perform (Top) Bootstrapping by analyzing offline data for initial Tactic Analysis and Skill Generation into a Skill Library. (Bottom) Incremental Synthesis uses Task Reasoning/Self Reflection to dynamically generate or enhance code-based skills, evolving the library for new tasks. The skills follow a structured decision-making pipeline with two core components: score_target(unit) for dynamic target prioritization and control_logic() for coordinating behavior. Textual observations are parsed into structured data (obs_data), mapping raw text to attributes, e.g., "Can move North: yes" -> can_move={'north': True}.

Task Reasoning
Self Reflection

Structured Communication Protocol

COMPASS implements a hierarchical communication protocol that focuses on efficient entity-based information sharing and multi-hop propagation. Each agent maintains an observation buffer containing information about entities in its field of view. At each timestep, agents share their local observations, which are then aggregated into a global entity memory accessible to all. COMPASS employs a multi-hop communication mechanism to propagate information about distant entities, enabling agents to build a more holistic observation of the environment by leveraging the collective knowledge of the team.

Skill Analysis

Focus Fire Logic Implementation. (a) VLM-generated Python code snippet implementing dynamic focus fire logic. The code prioritizes enemy units based on the number of allied attackers, scaling the attack bonus exponentially. (b–d) Visualizations of focus fire execution across Protoss, Terran, and Zerg.
Kitting logic. (a)-(c) demonstrate progressive stages of the kitting tactic where allied units strategically maintain optimal attack range while retreating from melee enemies. (d) shows the corresponding Python code snippet generated by the VLMs.
Isolating logic (a) Allied units strategically assemble into a cohesive formation. (b) The assembled units execute a rapid engagement against an isolated enemy unit, eliminating it before reinforcements can arrive, thus creating a numerical advantage.
Area-of-Effect (AOE) optimization for Baneling (a) The VLM-generated Python code calculates optimal detonation positions by analyzing enemy cluster density and positions. (b-c) Visual sequence showing Baneling execution, where the unit identifies a dense cluster of enemy units and detonates for maximum AOE damage.

Performance

COMPASS demonstrates significant performance advantages in SMACv2, particularly excelling in Protoss scenarios where it achieves a 57% win rate in symmetric engagements using GPT-4o-mini, substantially outperforming traditional approaches like QMIX (27%), MAPPO (32%), and HAPPO (34%). However, performance varies across race matchups. While maintaining strong results in Terran scenarios (39% win rate), COMPASS shows limited effectiveness in Zerg scenarios (16% win rate). This performance disparity can be attributed to the unique mechanics of Zerg combat units, which demand more fine-grained micromanagement due to their shorter attack ranges and reliance on swarm-based tactics.

Comparative performance of COMPASS (with three VLM variants: G-4o=GPT-4o-mini, C-Hk=Claude-3-Haiku, Q2-VL=Qwen2-VL-72B) and state-of-the-art MARL baselines on SMACv2. Median win rates (\%) and standard deviations (subscripts) are reported across Protoss, Terran, and Zerg scenarios in symmetric (5v5) and asymmetric (5v6) categories. Results are averaged over 5 seeds. Bold values denote the best performance in each scenario.
QMIX MAPPO HAPPO HASAC COMPASS
G-4o C-Hk Q2-VL
PROTOSS
SYMMETRIC 0.270.03 0.320.067 0.340.07 0.200.08 0.570.08 0.490.06 0.450.04
ASYMMETRIC 0.010.01 0.040.04 0.020.03 0.010.02 0.080.04 0.060.05 0.060.03
TERRAN
SYMMETRIC 0.380.04 0.360.1 0.350.1 0.290.01 0.390.01 0.380.05 0.310.02
ASYMMETRIC 0.060.02 0.070.06 0.010.03 0.050.02 0.10.03 0.10.01 0.060.03
ZERG
SYMMETRIC 0.210.01 0.270.04 0.20.11 0.240.07 0.160.07 0.180.02 0.140.03
ASYMMETRIC 0.180.03 0.130.09 0.090.02 0.080.05 0.030.01 0.040.01 0.020.01
Baseline training results on SMACv2.

Ablation Studies

Skill Initialization To evaluate the impact of our skill initialization, we analyze the performance of COMPASS using only the initialized skill library derived from expert demonstrations. The results in Table demonstrate that skill initialization alone achieves non-trivial performance across different scenarios, particularly in symmetric matchups. Moreover, the gap between initialized skills and COMPASS underscores the necessity of incremental skill synthesis.

Win rates of the initialized skill library (bootstrapped from expert demonstrations) on SMACv2.
PROTOSS TERRAN ZERG
5V5 0.350.06 0.240.04 0.060.01
5V6 0.040.05 0.060.02 0.020.03

Communication To demonstrate the critical role of communication, we evaluated COMPASS on Protoss 5v5 without communication. The resulting win rate with GPT-4o-mini decreased to 0.06. Without communication, only the initial discoverer retains enemy visibility, while others cannot 'see' enemies and default to no enemy behaviors, disrupting engagement and coordination.

Self Reflection Removing self-reflection in Protoss 5v5 reduced the win rate by 10%, highlighting its role in refining decision-making.

Visual information Omitting visual input led to a 10% performance drop, forcing agents to rely solely on textual cues for spatial awareness. Without images, map boundaries are inferred indirectly (e.g., from movement restrictions) rather than directly perceived. This reduces spatial understanding, leading to suboptimal movement and positioning decisions.

More Results