Cooperative Multi-Agent Planning with Adaptive Skill Synthesis
We present COMPASS, a novel multi-agent architecture that integrates vision-language models (VLMs) with a dynamic skill library and structured communication for decentralized closed-loop decision-making.
COMPASS demonstrates significant performance advantages in SMACv2, particularly excelling in Protoss scenarios where it achieves a 57% win rate in symmetric engagements using GPT-4o-mini, substantially outperforming traditional approaches like QMIX (27%), MAPPO (32%), and HAPPO (34%). However, performance varies across race matchups. While maintaining strong results in Terran scenarios (39% win rate), COMPASS shows limited effectiveness in Zerg scenarios (16% win rate). This performance disparity can be attributed to the unique mechanics of Zerg combat units, which demand more fine-grained micromanagement due to their shorter attack ranges and reliance on swarm-based tactics.
QMIX | MAPPO | HAPPO | HASAC | COMPASS | |||
---|---|---|---|---|---|---|---|
G-4o | C-Hk | Q2-VL | |||||
PROTOSS | |||||||
SYMMETRIC | 0.270.03 | 0.320.067 | 0.340.07 | 0.200.08 | 0.570.08 | 0.490.06 | 0.450.04 |
ASYMMETRIC | 0.010.01 | 0.040.04 | 0.020.03 | 0.010.02 | 0.080.04 | 0.060.05 | 0.060.03 |
TERRAN | |||||||
SYMMETRIC | 0.380.04 | 0.360.1 | 0.350.1 | 0.290.01 | 0.390.01 | 0.380.05 | 0.310.02 |
ASYMMETRIC | 0.060.02 | 0.070.06 | 0.010.03 | 0.050.02 | 0.10.03 | 0.10.01 | 0.060.03 |
ZERG | |||||||
SYMMETRIC | 0.210.01 | 0.270.04 | 0.20.11 | 0.240.07 | 0.160.07 | 0.180.02 | 0.140.03 |
ASYMMETRIC | 0.180.03 | 0.130.09 | 0.090.02 | 0.080.05 | 0.030.01 | 0.040.01 | 0.020.01 |
Skill Initialization To evaluate the impact of our skill initialization, we analyze the performance of COMPASS using only the initialized skill library derived from expert demonstrations. The results in Table demonstrate that skill initialization alone achieves non-trivial performance across different scenarios, particularly in symmetric matchups. Moreover, the gap between initialized skills and COMPASS underscores the necessity of incremental skill synthesis.
PROTOSS | TERRAN | ZERG | |
---|---|---|---|
5V5 | 0.350.06 | 0.240.04 | 0.060.01 |
5V6 | 0.040.05 | 0.060.02 | 0.020.03 |
Communication To demonstrate the critical role of communication, we evaluated COMPASS on Protoss 5v5 without communication. The resulting win rate with GPT-4o-mini decreased to 0.06. Without communication, only the initial discoverer retains enemy visibility, while others cannot 'see' enemies and default to no enemy behaviors, disrupting engagement and coordination.
Self Reflection Removing self-reflection in Protoss 5v5 reduced the win rate by 10%, highlighting its role in refining decision-making.
Visual information Omitting visual input led to a 10% performance drop, forcing agents to rely solely on textual cues for spatial awareness. Without images, map boundaries are inferred indirectly (e.g., from movement restrictions) rather than directly perceived. This reduces spatial understanding, leading to suboptimal movement and positioning decisions.