Large Language Models (LLMs) have demonstrated remarkable performance across various tasks. A promising but largely under-explored area is their potential to facilitate human coordination with many agents. Such capabilities would be useful in domains including disaster response, urban planning, and real-time strategy scenarios. In this work, we introduce (1) a real-time strategy game benchmark designed to evaluate these abilities and (2) a novel framework we term HIVE. HIVE empowers a single human to coordinate swarms of up to 2,000 agents using natural language dialog with an LLM. We present promising results on this multi-agent benchmark, with our hybrid approach solving tasks such as coordinating agent movements, exploiting unit weaknesses, leveraging human annotations, and understanding terrain and strategic points. However, our findings also highlight critical limitations of current models, including difficulties in processing spatial visual information and challenges in formulating long-term strategic plans. This work sheds light on the potential and limitations of LLMs in human-swarm coordination, paving the way for future research in this area.
Large Language Models (LLMs) are revolutionizing how we interact with artificial intelligence, and one exciting frontier is their ability to coordinate multiple agents in complex scenarios. Enter HIVE (Hybrid Intelligence for Vast Engagements), a new framework that bridges human strategy and AI execution in real-time environments.
HIVE works by taking natural language instructions from humans and transforming them into detailed operational plans for controlling thousands of agents simultaneously. Think of it as a translator that converts your high-level strategic thoughts into tactical instructions that AI agents can understand and execute.
To put HIVE through its paces, we created a comprehensive benchmark testing five essential capabilities: coordination, weakness exploitation, spatial awareness, terrain usage, and strategic planning. Our research not only demonstrates HIVE’s potential for enhancing human decision-making but also reveals important insights about current LLM limitations, including their challenges with visual-spatial reasoning and long-term strategy.
We present HIVE—outlined in the image above—a novel framework enabling natural language control of thousands of units in strategy games through human-AI collaboration. HIVE translates high-level human commands into detailed operational plans using Large Language Models (LLMs).
HIVE operates through three key components:
The game features:
We evaluated HIVE across five core capabilities:
The map used to test each can be seen in the image bellow.
Ability evaluation for each model is seen in the image below.
In this work, we present a new challenge for LLMs as human assistants to control up to two thousand units in a strategy game. We propose a new framework, HIVE, to allow a player to give high-level commands that an LLM translates into a long-term plan that controls the behavior of each unit. We showed that generalist LLMs such as Claude Sonnet and GPT-4o can handle such tasks but are still sensitive to slight changes in the player’s prompts. Complimentary experiments showed that HIVE requires human help to get the best performance and that generalist LLMs’ visual capacity to use an out-of-distribution map for terrain and landmark locations is still to be improved. This work opens many interesting avenues for improving LLMs’ capacities to collaborate with humans, such as improving their map-reading abilities, reducing their sensitivity to prompts, and increasing their long-term planning.
@misc{anne2024harnessinglanguagecoordinationframework,
title={Harnessing Language for Coordination: A Framework and Benchmark for LLM-Driven Multi-Agent Control},
author={Timothée Anne and Noah Syrkis and Meriem Elhosni and Florian Turati and Franck Legendre and Alain Jaquier and Sebastian Risi},
year={2024},
eprint={2412.11761},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2412.11761},
}