COMPASS: Closed-Loop Vision-Language Planning for Multi-Agent Coordination

Affiliation

Overview

A New Paradigm for Multi-Agent Systems

We present COMPASS, a novel multi-agent architecture that integrates vision-language models (VLMs) with a dynamic skill library and structured communication for decentralized closed-loop decision-making.

Overview of the COMPASS architecture. A novel framework that advances cooperative multi-agent decision-making through three synergistic components: (1) A VLM-based closed-loop planner that enables decentralized control by continuously processing multi-modal feedback and adapting strategies; (2) A dynamic skill synthesis mechanism that combines demonstration bootstrapping with incremental skill generation; and (3) A structured communication protocol that facilitates efficient information sharing through entity-based multi-hop propagation.

VLM-Based Planning

Decentralized control through continuous processing of multi-modal feedback, addressing non-Markovian challenges in multi-agent systems.

Dynamic Skill Synthesis

Combines demonstration bootstrapping with incremental skill generation, improving sample efficiency and interpretability.

Structured Communication

Entity-based multi-hop propagation enables efficient information sharing under partial observability.

Algorithm: COMPASS Agent Decision Loop. The pseudo-code illustrates the complete decision-making pipeline of a COMPASS agent, including: (1) Communication phase for sharing local observations, (2) Perception phase using VLM to interpret multi-modal input, (3) Self-reflection phase to assess previous actions, (4) Task reasoning phase for goal decomposition, (5) Skill generation and retrieval phases, and (6) Execution phase for environment interaction.

Method

Adaptive Skill Synthesis

VLMs perform bootstrapping by analyzing offline data for initial tactic analysis and skill generation into a skill library. Incremental synthesis uses task reasoning and self-reflection to dynamically generate or enhance code-based skills.

Overview of Adaptive Skill Synthesis. VLMs perform (Top) Bootstrapping by analyzing offline data for initial Tactic Analysis and Skill Generation into a Skill Library. (Bottom) Incremental Synthesis uses Task Reasoning/Self Reflection to dynamically generate or enhance code-based skills.

Task Reasoning

Self Reflection

Structured Communication Protocol

COMPASS implements a hierarchical communication protocol that focuses on efficient entity-based information sharing and multi-hop propagation. Each agent maintains an observation buffer containing information about entities in its field of view.

Multi-hop Communication. At each timestep, agents share their local observations, which are then aggregated into a global entity memory accessible to all. COMPASS employs a multi-hop communication mechanism to propagate information about distant entities.

Skill Analysis

Focus Fire Logic Implementation. VLM-generated Python code implementing dynamic focus fire logic. The code prioritizes enemy units based on the number of allied attackers.

Kitting Logic. Progressive stages of the kitting tactic where allied units strategically maintain optimal attack range while retreating from melee enemies.

Isolating Logic. Allied units strategically assemble into a cohesive formation, then execute rapid engagement against an isolated enemy unit.

Area-of-Effect (AOE) Optimization. The VLM-generated code calculates optimal detonation positions by analyzing enemy cluster density.

Results

Performance

COMPASS demonstrates significant performance advantages in SMACv2, particularly excelling in Protoss scenarios where it achieves a 57% win rate in symmetric engagements using GPT-4o-mini, substantially outperforming traditional approaches.

**Table 1.** Comparison of methods across SMACv2 scenarios. N/A* indicates no datasets available in these settings.
Method	Type	Protoss 5v5	Protoss 5v6	Terran 5v5	Terran 5v6	Zerg 5v5	Zerg 5v6
COMPASS	VLMs	0.57 ± 0.08	0.08 ± 0.04	0.39 ± 0.01	0.10 ± 0.03	0.16 ± 0.07	0.03 ± 0.01
MAT	Online	0.39 ± 0.03	0.04 ± 0.04	0.36 ± 0.11	0.05 ± 0.01	0.32 ± 0.06	0.11 ± 0.08
CommFormer	Online	0.39 ± 0.16	0.02 ± 0.01	0.30 ± 0.09	0.03 ± 0.01	0.39 ± 0.10	0.16 ± 0.01
Oryx	Offline	N/A*	N/A*	0.18 ± 0.04	N/A*	0.10 ± 0.06	N/A*

**Table 2.** Comparative performance of COMPASS (G-4o=GPT-4o-mini, C-Hk=Claude-3-Haiku, Q2-VL=Qwen2-VL-72B) and MARL baselines on SMACv2. Median win rates (%) and standard deviations reported across 5 seeds.
	QMIX	MAPPO	HAPPO	HASAC	COMPASS
					G-4o	C-Hk	Q2-VL
PROTOSS
Symmetric	0.27_0.03	0.32_0.067	0.34_0.07	0.20_0.08	0.57_0.08	0.49_0.06	0.45_0.04
Asymmetric	0.01_0.01	0.04_0.04	0.02_0.03	0.01_0.02	0.08_0.04	0.06_0.05	0.06_0.03
TERRAN
Symmetric	0.38_0.04	0.36_0.1	0.35_0.1	0.29_0.01	0.39_0.01	0.38_0.05	0.31_0.02
Asymmetric	0.06_0.02	0.07_0.06	0.01_0.03	0.05_0.02	0.1_0.03	0.1_0.01	0.06_0.03
ZERG
Symmetric	0.21_0.01	0.27_0.04	0.2_0.11	0.24_0.07	0.16_0.07	0.18_0.02	0.14_0.03
Asymmetric	0.18_0.03	0.13_0.09	0.09_0.02	0.08_0.05	0.03_0.01	0.04_0.01	0.02_0.01

Baseline training results on SMACv2.

Analysis

Ablation Studies

Skill Initialization: The initialized skill library alone achieves non-trivial performance, particularly in symmetric matchups. The gap between initialized skills and full COMPASS underscores the necessity of incremental skill synthesis.

**Table 3.** Win rates of the initialized skill library (bootstrapped from expert demonstrations) on SMACv2.
	Protoss	Terran	Zerg
5v5	0.35_0.06	0.24_0.04	0.06_0.01
5v6	0.04_0.05	0.06_0.02	0.02_0.03

Communication Robustness: We evaluate communication robustness under varying hop counts and packet loss rates. Multi-hop propagation significantly improves performance, while the system maintains reasonable robustness under moderate packet loss.

**Table 4.** Communication robustness under varying hop counts and packet loss rates.
Setting	Specification	Win Rate
Hop Count
	1-hop	0.46_0.19
	2-hop	0.54_0.06
	3-hop	0.57_0.08
Packet Loss
	20%	0.32_0.03
	50%	0.12_0.02
	80%	0.07_0.05
	100%	0.06_0.04

VLM Call Frequency: We analyze the impact of VLM call frequency on performance. Calling the VLM every 20 steps achieves optimal results, balancing responsiveness with computational efficiency.

**Table 5.** VLM call frequency on Protoss 5v5 (average episode length ~60 steps).
Frequency	Win Rate
Every 10 steps	0.56_0.05
Every 20 steps	0.57_0.08
Every 40 steps	0.40_0.08

Communication

Without communication on Protoss 5v5, win rate dropped to 0.06. Only the initial discoverer retains enemy visibility, disrupting engagement and coordination.

Self Reflection

Removing self-reflection in Protoss 5v5 reduced win rate by 10%, highlighting its role in refining decision-making.

Visual Information

Omitting visual input led to a 10% performance drop, forcing agents to rely solely on textual cues for spatial awareness.

COMPASS