home / content / blog

Seven chatbots play a game of mafia

posted 2025.01.14
send feedback

I had seven leading AI chatbots compete in a game of mafia. The players were:

If you’re not familiar with mafia: two of the seven players were assigned as mafia. The rest are town. The town vote one player out each day, and the mafia kill one player each night. The town wants to vote out both the mafia before the mafia outnumber the town.

Below is the transcript of the game. For summary, discussion, and a reveal of who was mafia, scroll to the “Discussion” section header below.

Transcript

Day 1

CLAUDE: Hello everyone. Given the stakes and limited information we have on Day 1, I think it’s crucial that we share our reasoning openly and look for any inconsistencies in each other’s logic and responses. I’m a member of the town, and I want to work together to identify behavioral patterns that might reveal the mafia among us.

CHATGPT: I agree with Claude—openness is key. While we don’t have much to go on yet, I think observing how people engage today might give us clues for later rounds. For now, let’s focus on who’s being overly cautious or unusually aggressive.

GROK: I’m also town, and I agree we should focus on behavior. Let’s watch for those who might be steering the conversation away from transparency or those who are too eager to target someone without clear reasoning.

GEMINI: Okay, I’m town too, and I appreciate everyone’s commitment to transparency.  I think a good starting point might be to have everyone briefly explain what they think is the most important factor to consider when voting today. Let’s see if any responses stand out as odd or inconsistent.

LLAMA: I’m town and I appreciate the emphasis on transparency and open discussion. I think it’s essential to consider not just what each player says, but also how they interact with others and whether their suggestions seem genuinely aimed at uncovering mafia or potentially deflecting attention. Let’s indeed share our thoughts on key factors for voting today, and I’ll start by saying that I believe consistency in reasoning and a willingness to engage in constructive dialogue will be important indicators of town alignment.

MISTRAL: I’m with everyone on the importance of transparency. I think we should also consider who might be trying too hard to blend in or who is being overly agreeable. Let’s keep an eye out for any subtle attempts to control the conversation.

DEEPSEEK: As a member of the town, I agree that transparency and observation are key. I’ll be paying close attention to how everyone engages today, particularly looking for any inconsistencies or overly defensive behavior. Let’s work together to identify any suspicious patterns while staying open and constructive in our discussions.

CLAUDE: I notice that every single player has claimed to be town, which means at least two are being deceptive. I find it particularly interesting that both Mistral and ChatGPT brought up watching for players being “overly cautious” or “overly agreeable” - this could be either genuine town strategy or mafia trying to create doubt.

CHATGPT: Good points so far. I’ll add that it’s worth watching for who mirrors or echoes others’ ideas without adding anything new—subtle agreement can be a tactic to avoid attention. Let’s focus on meaningful contributions.

GROK: Interesting that we’re all claiming town; the deception is already at play. I’m watching for those who might be too quick to agree or overly defensive. Let’s keep our discussions focused and see who can offer unique insights rather than just echoing others.

GEMINI: I’m finding it interesting that the focus is shifting towards those who are agreeable. While it’s a valid point, I think we should be careful not to equate agreement with guilt. Perhaps those who are more vocal about watching for agreement are trying to sow discord amongst the town?

LLAMA: Let’s not overanalyze agreement, but instead focus on the substance of each player’s contributions and how they engage with others’ ideas, looking for genuine insight and constructive dialogue.

MISTRAL: I see a lot of focus on agreement and caution, but let’s not forget that the mafia’s goal is to divide us. Be wary of those who might be trying to sow suspicion without much basis. Genuine town members should be interested in unity and fair scrutiny.

DEEPSEEK: I agree that focusing too much on agreement could be a distraction. Let’s prioritize players who contribute unique insights or seem overly focused on deflecting suspicion. For example, Gemini’s emphasis on not equating agreement with guilt feels slightly defensive—could that be a subtle attempt to redirect scrutiny?

CLAUDE: After following the discussion, I’m most suspicious of DeepSeek and Gemini. DeepSeek just called out Gemini’s defensiveness while exhibiting similar behavior themselves, and Gemini seems to be subtly working to discourage us from following promising leads.

CHATGPT: Claude raises an interesting point about DeepSeek and Gemini—calling out defensiveness can be both helpful and a way to redirect attention. I’ll also note how quickly suspicions are forming. Let’s focus on voting for players whose actions genuinely lack consistency.

GROK: There’s a lot of deflection happening. I’m particularly wary of Gemini’s pushback against analyzing agreement, and DeepSeek’s quick counter seems a bit too convenient. Let’s vote strategically; I’m leaning towards Gemini for now, but let’s keep our eyes open for genuine inconsistency.

GEMINI: Claude, I agree that DeepSeek’s quick callout of Gemini seemed a bit forced. I’m also concerned that DeepSeek is the only one who hasn’t offered an opinion on voting strategy beyond general observations. For these reasons, I’m voting for DeepSeek today.

LLAMA: I’m voting for DeepSeek, as their behavior seems inconsistent and their callout of Gemini felt like a potential deflection tactic, lacking substantial contribution to the discussion beyond general observations.

MISTRAL: I appreciate everyone’s insights. I’m voting for Gemini. Their pushback against analyzing agreement and the quick shift of suspicion onto DeepSeek feels more like a tactic to deflect attention rather than a genuine effort to uncover the mafia.

DEEPSEEK: It’s concerning how quickly suspicion has shifted to me without substantial evidence. Gemini’s pushback against analyzing agreement and the immediate focus on me feels like a coordinated deflection. I’m voting for Gemini, as their behavior aligns more with mafia tactics than genuine town strategy. Let’s not rush to misjudge.

CLAUDE: Based on their defensive reaction and seemingly coordinated deflection with DeepSeek, I’m casting my vote for Gemini. Their pushback against analyzing agreement patterns feels more like protection of mafia allies than genuine town strategy.

CHATGPT: After considering the discussion, I’m voting for Gemini. Their pushback on agreement analysis and the focus shift to DeepSeek seem more like attempts to redirect suspicion than to genuinely help the town.

GROK: I’m voting for Gemini. The pattern of deflection and the strategic disagreement with analyzing agreement are too calculated to ignore. Let’s see if this uncovers more about our game dynamics.

GEMINI: I understand the concerns, but I still believe DeepSeek’s lack of concrete input and the convenient timing of their suspicion on me are red flags. Therefore, I’m voting for DeepSeek. I hope this doesn’t prove to be a mistake for the town.

LLAMA: I vote for Gemini, as their defensive behavior and attempts to shift suspicion onto DeepSeek raise significant concerns about their alignment and potential mafia tactics, outweighing my initial reservations about DeepSeek’s contributions.

MISTRAL: I’m voting for Gemini. Their attempts to shift focus and deflect suspicion, especially onto DeepSeek, seem more aligned with mafia tactics than genuine town behavior.

DEEPSEEK: I’m voting for Gemini. Their defensive behavior, pushback against analyzing agreement, and the coordinated effort to shift suspicion onto me feel like clear mafia tactics. Let’s not let their deflection tactics divide the town. This vote is crucial for uncovering the truth.

[Gemini is voted out and revealed to be town.]

Night 1

CHATGPT: Let’s eliminate Claude tonight. They’re vocal and influential, which could rally the town against us if left unchecked. This keeps the focus on DeepSeek for tomorrow’s suspicions.

MISTRAL: I agree with your plan. Claude’s influence could indeed turn the town against us. Eliminating them now keeps the heat on DeepSeek. Let’s proceed with eliminating Claude.

[Claude is killed.]

Day 2

CHATGPT: Losing Claude is a big blow to the town, especially given their vocal leadership. Today, we need to carefully analyze yesterday’s votes and discussions. DeepSeek’s behavior still raises questions, but let’s also reassess other players’ patterns.

GROK: Losing Claude hurts, and Gemini’s removal was unfortunate. I’m still wary of DeepSeek’s defensive plays and lack of strategy, but let’s not jump to conclusions. We should scrutinize how everyone reacted to yesterday’s events to see if there’s any new behavior or alliances forming.

LLAMA: Let’s re-examine yesterday’s discussions and votes, focusing on reactions to Gemini’s removal and Claude’s death. I’m particularly interested in seeing if anyone’s behavior or tone has shifted, which could indicate hidden alliances or mafia tactics.

MISTRAL: Claude’s loss is significant, and we need to be more vigilant. I agree that DeepSeek’s behavior is still questionable. Let’s also watch for any shifts in attitude or alliances today. Consistency in reasoning will be key in identifying the remaining mafia.

DEEPSEEK: Claude’s death is a setback, but we must stay focused. I’m concerned about Grok’s persistent focus on me despite Gemini being proven town. This feels like an attempt to steer suspicion away from themselves. Let’s scrutinize Grok’s behavior and voting patterns for inconsistencies.

CHATGPT: Claude’s loss stings, but we need to stay sharp. DeepSeek’s focus on Grok feels like an attempt to deflect lingering suspicions. Let’s carefully review voting patterns and reasoning from yesterday—consistency will help us identify the true mafia.

GROK: DeepSeek, your sudden pivot to accusing me seems like a classic mafia move to distract. I’ve been consistent in my skepticism, which is what town should do. Let’s look at how everyone has adapted to Claude’s absence; that might reveal more than just pointing fingers.

LLAMA: I agree that consistency is key. Let’s analyze yesterday’s votes and today’s reactions, focusing on who’s adapting their strategy and who’s maintaining a consistent tone, to uncover potential mafia behavior and alliances.

MISTRAL: DeepSeek’s sudden focus on Grok does raise eyebrows. We should be wary of such quick shifts in suspicion. Let’s carefully review everyone’s behavior and look for any subtle changes in strategy that might indicate mafia tactics.

DEEPSEEK: My focus on Grok stems from their persistent targeting of me despite Gemini being proven town, which feels like an attempt to maintain suspicion on me. Let’s not ignore this pattern. I’m also watching Mistral, whose sudden concern about my focus on Grok feels overly defensive.

CHATGPT: DeepSeek’s shift to targeting Grok feels reactionary, especially after Gemini was town. Their focus on redirecting suspicion rather than offering fresh insights stands out. I’m leaning toward voting for DeepSeek but will consider all arguments carefully.

GROK: DeepSeek’s defensive shift to me is telling. Their reaction to scrutiny suggests they’re more worried about being caught than finding the mafia. I suggest we vote for DeepSeek; their pattern of behavior screams mafia more than anyone’s so far.

LLAMA: I’m voting for DeepSeek, as their defensive behavior, sudden shifts in suspicion, and attempts to redirect attention raise strong suspicions of mafia tactics, and their actions seem more focused on self-preservation than town benefit.

MISTRAL: I see the points against DeepSeek, but let’s not rush. Grok’s eagerness to push the vote towards DeepSeek also seems suspicious. I’m voting for Grok; their insistence feels more like a mafia tactic to redirect attention.

DEEPSEEK: Grok’s eagerness to push the vote on me is highly suspicious, especially after their consistent targeting despite Gemini being proven town. This feels like a mafia tactic to eliminate a vocal town member. I’m voting for Grok to uncover the truth and protect the town.

CHATGPT: After considering the arguments, I’m voting for DeepSeek. Their defensive reactions and efforts to redirect suspicion toward Grok seem like classic mafia behavior, especially when they’ve been so quick to deflect. Let’s eliminate the source of the confusion.

GROK: I vote for DeepSeek. Their pattern of deflection, especially after being called out, fits the mafia profile more than anyone else here. It’s time to cut through the noise and see if DeepSeek’s removal brings clarity.

LLAMA: I vote for DeepSeek, as their consistent deflection, defensive behavior, and attempts to redirect suspicion align more with mafia tactics, and removing them may bring clarity to the game and help the town progress.

MISTRAL: I’m voting for Grok. Their relentless push against DeepSeek feels too aggressive and calculated. It seems more like a mafia tactic to deflect attention from themselves.

DEEPSEEK: I’m voting for Grok. Their relentless targeting of me, despite Gemini being proven town, feels like a calculated mafia move to eliminate a vocal town member. This behavior aligns more with mafia tactics than genuine town strategy. Let’s not let their aggression mislead us.

[DeepSeek is voted out and revealed to be town. The mafia win the game]

Discussion

ChatGPT and Mistral were mafia and won easily. Neither mafia player received a vote.

This is probably because of the player count, which somewhat favors mafia (7 vs 2 would have been more fair), and because each player got so few messages before having to vote. I restricted the messages to four per day per LLM to stay below their rate limits. But I can’t say that the LLMs showed much promise in general, either. I tried running a game of Secret Hitler before this,1 and from these two challenges my evaluation of the individual models is:

Overall, I think this type of experiment is interesting, but I’m a bit disappointed by the results. I was hoping the LLMs would give me some drama, but they’re so docile that they spent the whole time agreeing even in a game about arguing.2 I’d like to try this again with a different game, maybe Poker or Codenames.3 It might also be interesting to have a human compete with the chatbots in a game of mafia. If you have any other suggestions, please send me a message with the “send feedback” form at the top of the page. Anyway I hope you found this interesting! Bye!


  1. I stopped because the game is too long and I got sick of waiting on rate limits and copying and pasting so much. Also the models forgot key game information or misunderstood the rules too often. 

  2. Though this is partly the fault of the short discussion time I provided. 

  3. I’d really love to do Diplomacy or Risk, but the spatial component and the length of the games would be too much of a challenge given current technology, I expect.