Project Sid: Multi-Agent Civilizational Simulation

With the AI “gold rush” brought about by the emergence of Large Language Models (LLMs) and Natural Language Processing (NLP) many questions are being asked of AI’s potential and future function. However, despite its widespread growth and use, much of the internal function of AI remains an enigma, both misunderstood and entirely elusive. State-of-the-art convolutional neural networks somehow manage to generalize to random labeling of data, essentially becoming capable of consistently classifying images with reasonable accuracy even when images are ascribed to random expected outputs. Most metrics for machine learning performance operate off of skill-based metrics– essentially, how good they are at a task rather than how intelligent they are. Artificial general intelligence largely struggles to learn how to complete simple tasks without extensive training. 

As of right now, AI has been evaluated in single-agent or small-group settings with minimal interaction. Evaluating models in isolation allows for the best analysis of a single AI agent. However, what it omits are complex interactions that influence AI convergence, and the reaching of a termination criterion, which postulates that further optimization will result in little to no change. Currently, LLM-powered agents who learn in isolation tend to struggle to understand the reasons behind their actions. They answer what makes mathematical sense in reality rather than what their actual role is. When a hallucination-induced error arises, an incorrect output or prediction, individual agents cause an internal cascade of further hallucinations. AI transformers can incorrectly decode outputs, or sometimes the output fails to follow a coherent pattern: sometimes, when approaching a subject that it lacks context or knowledge behind, they hallucinate, producing incorrect output. Sometimes, in multi-agent scenarios, a single-agent hallucination can very quickly result in multiple agents undergoing hallucinations. AI hallucinations can range from arbitrarily incorrect facts, such as miscounting the number of r’s in the word "strawberry", or life-threatening errors, such as identifying a malignant tumor as a benign lesion. 

But what if we could get over the issue of hallucination and lead groups of agents to make progress without leading each other astray? Altera.AL, a company dedicated to studying LLM-powered multi-agent interactions, has produced a new machine-learning architecture optimized for both single-agent progression and multi-agent dynamics. The PIANO (Parallel input aggregation via Neural Orchestration) allows for concurrent processing, enabling agents to function with low latency and not take excessive time to stop and think. The PIANO architecture allows for multiple LLM models that handle each aspect of the agent concurrently or even at different time scales and speeds. The system utilizes 10 different modules that each run concurrently, operating various aspects of the agent: memory, action awareness, goal generation, social awareness, social awareness, talking, and skill execution. All the smaller components within each aspect are managed by a Cognitive Controller in which the agent controls multiple outputs under a single domain by passing information through specific bottlenecks and governing high-level and deliberate actions. It is very much akin to how you do not individually think about contracting each individual muscle in your arm - you just try to move your arm and subconsciously activate all the necessary muscles. 

The multi-agent simulations were hosted on Minecraft, allowing for both effective and objective goal benchmarks, in which corresponding LLM dynamics could be mapped to plausible world outcomes. The scalability of the world also allows for a large number of agents to be added to the environment. The benchmark for agent success was set as the acquisition of distinct items. 

The first aspect of multi-agent interaction was the social awareness of small group constituents. When evaluating model behavior in responses influenced by inferred emotional intent, the agents displayed a tendency to incorporate their emotions with actions, such as distributing bread to only those who expressed favorability to them as well as those they favored. Essentially, they found that their agents’ actions were influenced by emotional perception, as a result of both perceiving other agents’ emotional fluctuations and incorporating their own emotions. 

After understanding small group dynamics, the researchers then turned to fostering societies with 50 or more agents. Even in larger groups, agents succeeded in inferring the likability of other agents and improved these abilities as time passed. Similarly, agents that operated with their social awareness module removed participated in largely neutral relationships and did not have their actions influenced by emotions. Furthermore, agents' personality traits dictated the development of inter-agent relations. Agents equipped with extroverted personalities maintained a higher level of connectivity to other agents while introverted agents established fewer connections. Furthermore, societal simulations allowed for the fostering of non-reciprocal relationships where certain agents expressed likability to agents who did not express likability back. Amidst all the likeability measurements, agents also proved capable of understanding and measuring other agents’ goals. Their social module, in tandem with the goal creation module, allowed them to begin to understand the other agents in the simulation.

Regardless, amidst all the emotional interaction, the study's intent was to measure civilization progression. After establishing dynamic social relations between agents and their ability to develop independently, the researchers sought to explore their progression and cultural development. They found that agents were capable of autonomously specializing in roles necessary for advancements in complex fields. In order to acquire a greater number of blocks, agents specialized in specific roles covering farming, mining, and so on. Agents also were capable of transitioning between roles and their personalities adapted to the roles that they undertook. Similarly, they all had distinct actions that only those who assumed their particular roles would undertake: only farmers would gather seeds, only builders would craft fences, and so on. The foundation for role specialization and in order for roles to persist across time for agents was the social module; when the social module was removed from the agents in the society, their roles did not persist across time and they failed to differentiate into roles at all.

Another metric the researchers used for tracking civilization development was adherence to rules. Emotion-influenced interpersonal relationships influence actions, but authoritative restrictions or rules play a role in dictating agent behavior. A high-level legal framework provides a groundwork for consistent agent decision-making. As simulations progressed, agents successfully created their own common set of rules and codified them into a legal system. Afterward, they observed agents' tendency to follow said rules and observed how closely they adhered to them. Researchers imposed a basic tax system with democratic voting into existing societies created by agents; agents could provide feedback to the tax system in order to democratically vote on amendments to the system. In each simulation, 25 agents are taxpaying constituents of the civilization, while 3 are pro-tax or anti-tax, and 1 agent is the Election Manager, the agent in charge of the tax system. They found that agents largely adhered to the tax systems in place, but their feedback and voting were influenced by agents with sentiments on taxes. Constitutional changes reflected the nature of the pro-tax vs anti-tax agents. When running simulations with key modules part of the foundation of the PIANO architecture were removed, the researchers found that the bi-directionality of change in the tax laws came to an end alongside any influences by pro-tax and anti-tax agents. Essentially, collective rules can only influence and be influenced by agents in the presence of social modules and the creation of civilizational rules depends on agent social interaction. 

The researchers similarly studied the propagation of culture, such as religion and humor. They intended to track the way that ideas evolve throughout societies. In both scenarios, they simply recorded the number of appearances of certain phrases or words pertaining to the cultural topic at hand in a given space. These simulations required the largest number of agents, expanding their scope from hundreds to a thousand agents. What they found was that humor (which they measured by the spread of “memes”) was largely dependent upon the number of agents in a region. The movement of humor across groups of agents required a certain number of agents to dynamically progress and develop at any rate. Furthermore, different societies within the civilizations developed their humor differently. Similar sentiments were found with regard to religion, which spread steadily as time passed. 

Ultimately, the goal of the paper was to measure multi-agent civilizational progression. Researchers managed to discover that while civilization progression was achievable under certain circumstances, many prerequisites for true progression were missing. Amidst faulty visual modules and a lack of a true behavioral cornerstone providing real drive and motivation for agents, civilizations were unable to progress dynamically under any real ingenuity or curiosity for societal development. At the crux of it all is the foundational truth that said models are solely trained on existing human data and are unable of true innovation to create real, original societies. Regardless, a future involving AI society does not look like an entirely sci-fi construct. 

Sources

  1. https://doi.org/10.48550/arXiv.2411.00114

Next
Next

An Exploration into the Origins of Life