VLMs perform bootstrapping by analyzing offline data for initial tactic analysis and skill generation into a skill library. Incremental synthesis uses task reasoning and self-reflection to dynamically generate or enhance code-based skills.
Closed-Loop Vision-Language Planning for Multi-Agent Coordination
Affiliation
We present COMPASS, a novel multi-agent architecture that integrates vision-language models (VLMs) with a dynamic skill library and structured communication for decentralized closed-loop decision-making.
Overview of the COMPASS architecture. A novel framework that advances cooperative multi-agent decision-making through three synergistic components: (1) A VLM-based closed-loop planner that enables decentralized control by continuously processing multi-modal feedback and adapting strategies; (2) A dynamic skill synthesis mechanism that combines demonstration bootstrapping with incremental skill generation; and (3) A structured communication protocol that facilitates efficient information sharing through entity-based multi-hop propagation.
Decentralized control through continuous processing of multi-modal feedback, addressing non-Markovian challenges in multi-agent systems.
Combines demonstration bootstrapping with incremental skill generation, improving sample efficiency and interpretability.
Entity-based multi-hop propagation enables efficient information sharing under partial observability.
Algorithm: COMPASS Agent Decision Loop. The pseudo-code illustrates the complete decision-making pipeline of a COMPASS agent, including: (1) Communication phase for sharing local observations, (2) Perception phase using VLM to interpret multi-modal input, (3) Self-reflection phase to assess previous actions, (4) Task reasoning phase for goal decomposition, (5) Skill generation and retrieval phases, and (6) Execution phase for environment interaction.
VLMs perform bootstrapping by analyzing offline data for initial tactic analysis and skill generation into a skill library. Incremental synthesis uses task reasoning and self-reflection to dynamically generate or enhance code-based skills.
Overview of Adaptive Skill Synthesis. VLMs perform (Top) Bootstrapping by analyzing offline data for initial Tactic Analysis and Skill Generation into a Skill Library. (Bottom) Incremental Synthesis uses Task Reasoning/Self Reflection to dynamically generate or enhance code-based skills.
Task Reasoning
Self Reflection
COMPASS implements a hierarchical communication protocol that focuses on efficient entity-based information sharing and multi-hop propagation. Each agent maintains an observation buffer containing information about entities in its field of view.
Multi-hop Communication. At each timestep, agents share their local observations, which are then aggregated into a global entity memory accessible to all. COMPASS employs a multi-hop communication mechanism to propagate information about distant entities.
Focus Fire Logic Implementation. VLM-generated Python code implementing dynamic focus fire logic. The code prioritizes enemy units based on the number of allied attackers.
Kitting Logic. Progressive stages of the kitting tactic where allied units strategically maintain optimal attack range while retreating from melee enemies.
Isolating Logic. Allied units strategically assemble into a cohesive formation, then execute rapid engagement against an isolated enemy unit.
Area-of-Effect (AOE) Optimization. The VLM-generated code calculates optimal detonation positions by analyzing enemy cluster density.
COMPASS demonstrates significant performance advantages in SMACv2, particularly excelling in Protoss scenarios where it achieves a 57% win rate in symmetric engagements using GPT-4o-mini, substantially outperforming traditional approaches.
| Method | Type | Protoss 5v5 | Protoss 5v6 | Terran 5v5 | Terran 5v6 | Zerg 5v5 | Zerg 5v6 |
|---|---|---|---|---|---|---|---|
| COMPASS | VLMs | 0.57 ± 0.08 | 0.08 ± 0.04 | 0.39 ± 0.01 | 0.10 ± 0.03 | 0.16 ± 0.07 | 0.03 ± 0.01 |
| MAT | Online | 0.39 ± 0.03 | 0.04 ± 0.04 | 0.36 ± 0.11 | 0.05 ± 0.01 | 0.32 ± 0.06 | 0.11 ± 0.08 |
| CommFormer | Online | 0.39 ± 0.16 | 0.02 ± 0.01 | 0.30 ± 0.09 | 0.03 ± 0.01 | 0.39 ± 0.10 | 0.16 ± 0.01 |
| Oryx | Offline | N/A* | N/A* | 0.18 ± 0.04 | N/A* | 0.10 ± 0.06 | N/A* |
| QMIX | MAPPO | HAPPO | HASAC | COMPASS | |||
|---|---|---|---|---|---|---|---|
| G-4o | C-Hk | Q2-VL | |||||
| PROTOSS | |||||||
| Symmetric | 0.270.03 | 0.320.067 | 0.340.07 | 0.200.08 | 0.570.08 | 0.490.06 | 0.450.04 |
| Asymmetric | 0.010.01 | 0.040.04 | 0.020.03 | 0.010.02 | 0.080.04 | 0.060.05 | 0.060.03 |
| TERRAN | |||||||
| Symmetric | 0.380.04 | 0.360.1 | 0.350.1 | 0.290.01 | 0.390.01 | 0.380.05 | 0.310.02 |
| Asymmetric | 0.060.02 | 0.070.06 | 0.010.03 | 0.050.02 | 0.10.03 | 0.10.01 | 0.060.03 |
| ZERG | |||||||
| Symmetric | 0.210.01 | 0.270.04 | 0.20.11 | 0.240.07 | 0.160.07 | 0.180.02 | 0.140.03 |
| Asymmetric | 0.180.03 | 0.130.09 | 0.090.02 | 0.080.05 | 0.030.01 | 0.040.01 | 0.020.01 |
Baseline training results on SMACv2.
Skill Initialization: The initialized skill library alone achieves non-trivial performance, particularly in symmetric matchups. The gap between initialized skills and full COMPASS underscores the necessity of incremental skill synthesis.
| Protoss | Terran | Zerg | |
|---|---|---|---|
| 5v5 | 0.350.06 | 0.240.04 | 0.060.01 |
| 5v6 | 0.040.05 | 0.060.02 | 0.020.03 |
Communication Robustness: We evaluate communication robustness under varying hop counts and packet loss rates. Multi-hop propagation significantly improves performance, while the system maintains reasonable robustness under moderate packet loss.
| Setting | Specification | Win Rate |
|---|---|---|
| Hop Count | ||
| 1-hop | 0.460.19 | |
| 2-hop | 0.540.06 | |
| 3-hop | 0.570.08 | |
| Packet Loss | ||
| 20% | 0.320.03 | |
| 50% | 0.120.02 | |
| 80% | 0.070.05 | |
| 100% | 0.060.04 | |
VLM Call Frequency: We analyze the impact of VLM call frequency on performance. Calling the VLM every 20 steps achieves optimal results, balancing responsiveness with computational efficiency.
| Frequency | Win Rate |
|---|---|
| Every 10 steps | 0.560.05 |
| Every 20 steps | 0.570.08 |
| Every 40 steps | 0.400.08 |
Without communication on Protoss 5v5, win rate dropped to 0.06. Only the initial discoverer retains enemy visibility, disrupting engagement and coordination.
Removing self-reflection in Protoss 5v5 reduced win rate by 10%, highlighting its role in refining decision-making.
Omitting visual input led to a 10% performance drop, forcing agents to rely solely on textual cues for spatial awareness.