PrisonersDilemmaTournament icon indicating copy to clipboard operation
PrisonersDilemmaTournament copied to clipboard

After the deadline in a few hours, please feel free to USE THIS THREAD TO SHARE YOUR STRATEGIES, since the fear of them being stolen will be gone

Open JoelMahon opened this issue 3 years ago • 42 comments

I'm looking forward to sharing mine, the best strategy doesn't exist until you specify the meta it will be run in, but I'm still pretty proud of how many metas it can beat in a tournament regardless of the real meta.

JoelMahon avatar May 26 '21 17:05 JoelMahon

I will upload it in my fork

zelosos avatar May 26 '21 21:05 zelosos

First, I would like to mention that the competition isn't over yet. Second, I recommend leaving a pull request here instead. We have an improved tournament runner, a leaderboard, and a discord as well.

dosu0 avatar May 26 '21 22:05 dosu0

my strategy has 5 states of memory that describes the opponent and gets the best out of him

HassanWahba avatar May 26 '21 22:05 HassanWahba

carykh's video description says "But, I'll give you all a 24-hour grace-period, which moves the deadline to May 27th, 2021 at 10 PM UTC, to allow for software bugs / resubmissions etc."

So be careful not to reveal too much!

Xapakas avatar May 26 '21 23:05 Xapakas

@Xapakas cheers, guess I'll leave it another day!

JoelMahon avatar May 26 '21 23:05 JoelMahon

Wait, there was a grace period?????

redtachyon2098 avatar May 27 '21 01:05 redtachyon2098

Wait, there was a grace period?????

Yes, it was put on the youtube video's description. I honestly feel like the grace period is too long, it should be for allowing final fixes for people who tried to submit at the end of the time and weren't able to, but instead it's a significant portion of the competition time and I feel like you have to use it or you're at a significant disadvantage. It also wasn't very clearly announced, I believe the video description was the only place.

Oh well, too late to change it at this point.

duckboycool avatar May 27 '21 01:05 duckboycool

My brain is too fried to make any more changes anyway, tried to refactor my code but the cleaner code performed way worse for some reason so I just submitted my ugly hacky one.

arielsysdev avatar May 27 '21 01:05 arielsysdev

STRATEGY NAME: revJossFGT6 - reversed Joss + forgiving grimTrigger

My strategy works like this: at the first rounds (I figured with a lot of testing in different environments that the best number of rounds for this was 6), it behaves like a "reversed Joss" (titFotTat that, when is continuously being attacked — and, therefofre, attacking — tries to cooperate). If the opponent didn't make any unprovoked attack, it would continue like this forever. Else, it would become what I like to call a forgiving grimTrigger. First, it would attack the opponent as much as I was attacked in the first 6 rounds. After that, it would send a cooperation request. If (and only if) it was accepted, the strategy would cooperate until it was betrayed (with an unprovoked attack). This cycle of attacking and cooperating would then continue forever.

So, as you can see, this is an addaptative approach. If the opponent cooperates for the initial rounds, then I would cooperate back with a very forgiving, but responsive, strat. What I mean by that is that it works like a titForTat, so it responds to the enemy's attack quickly (as opposed to a forgiving titForTat for example) but it is still very forgiving, because it likes to cooperate even if the opponent is attacking a lot (because SS (+3 +3) is better than TT (+1 +1)). And one thing is sure: since the opponent always cooperated in the first rounds, it is very unlikely that it would attack a lot in this stage. On the other hand, if the opponent attacked (and that would have to be an unprovoked attack) at least once in the initial rounds, I would respond that attack proportionally to my opponent's aggression. Think about it this way: the more the opponent attacks, the less likely it is for it to cooperate. But, still, since cooperating is almost always the best option, even if the opponent attacked me every time, I will send a cooperation offer. I forgot to mention that this offer shrinks the more it gets rejected, so if I face an alwaysDefect or even a 50/50 random strategy, I would eventually get deffensive and only attack back.

But there's a key element that is required for this strategy to work at its best that I didn't mention yet. It is the amount of randomness in the system. If you make a lot of copies of the random.py file in the exampleStrats folder and run the simulation, you will see that the best strategy will probably be the grimTrigger (as opposed to the standart one, titForTat). You can try it yourself, but the ideia behind it is that by getting deffensive (attacking a lot), the grimTrigger will always get at least 1 point each round and, on average, it will score 3 (~50% are TTs (+1 +1) and ~50% are TSs (+5 +0) -> avg = (+3 +0.5)). And since my strategy is essentially a grimTrigger against 50/50 randoms (63/64 times — the only way for it to be a revJoss would be if the random strategy output for the first rounds was SSSSSS, and this only happens once in 2^6 (64) times), it works reeeeally well against randoms. In fact, I tested it in the environment created by @l4vr0v (a fantastic IPD lunatic I'd say — with all respects, you're a beast) but modified with a lot of random copies (around 20% of the total amount of strategies) and, on average, my strategy got 2nd place! The first place (the priest I think — by the way, I tried to read its code but I couldn't understand anything about it lol) did a lot better in this environment (it always won basically), but, still, I'm very happy with the results. Sooo, I'm really hoping that the youtube comments are right, and that there will be a lot of random strategies submited.

(I don't really mind if people take advantage of knowing this strategy, since there were always forks from people like @l4vr0v with a lot of available strategies, which could be straight up copied for example. Also, the thing that matters the most isn't knowing about a single strategy, but the strategy trends, meaning the most common ones — those that can you can take advantage of — and the least common ones — those that you don't have to care about that much).

Wow, that was a lot! It's so cool to see how convoluted a game with only two possible choices for each player can get haha

vlcanesin avatar May 27 '21 17:05 vlcanesin

hey guys, yeah this thread is cool!

To answer @duckboycool 's concerns about the grace period being too long: I made the grace period 24 hours because lots of people were submitting re-submission requests in the final 24 hours between May 25-26th 10 PM UTC. I went through my final "deletion session" just about an hour before the deadline hit (I've been doing them roughly every day), and I didn't want to delete anybody's strategy and not give them enough warning before the deadline actually hit. Given that I don't know what timezone people are living in, and whether they'll be free to do some coding at 4 PM, or midnight, or 8 AM, or whatever, I felt like I had to give them around ~24 hours to make sure they had a full revolution of the clock to hear that their response had been deleted, and that they could re-submit.

carykh avatar May 27 '21 19:05 carykh

I guess because of the final submission deadline in #81 and the video FAQ, if you're really worried about others knowing your strategies, you should wait until May 28th 10 PM UTC. (Only people who have submitted a resubmission form already will be able to submit in 2 hours until then though.)

duckboycool avatar May 27 '21 20:05 duckboycool

Yeah, the new deadline (ONLY for those who managed to submit re-submission requests late) is roughly 26 hours from now, May 28th 10 PM UTC. There's only about 40 people in this category. Once that date hits, absolutely no more resubmission requests can be accepted.

If you haven't submitted a resubmission request as of now, you can't anymore because we're passed original the submission deadline!

carykh avatar May 27 '21 20:05 carykh

I originally did ML stuff, but it was a bit slow and the result wasn't that good. So I made this. It's only 11 lines of code.

def strategy(history, memory):
    choice = 1
    if len(history[0]) > 1:
        choice = history[1,-1]
        courage = 0
        for x in range(len(history[0])):
            courage += 2 * history[1][x] - 1 
        courage /= len(history[0])
        if courage < 0.01 and len(history[0]) > 3:
            choice = 0
    return choice, None

It's Tit for Tat, but it checks the ratio between the opponent's defects and cooperations, if the opponent defects too much it turns grim, but if the opponent starts cooperating enough to offset the ratio, it returns to its normal behavior. This did unnaturally well for its simplicity, and did 8th place in the "prisoners dillemma enjoyers" fork of this repo.

redtachyon2098 avatar May 28 '21 00:05 redtachyon2098

(The hyphens are there for showing indents.)

can't you just do ```py code ```

ThatXliner avatar May 28 '21 00:05 ThatXliner

I'll try that. Thanks.(EDIT: It worked.)

redtachyon2098 avatar May 28 '21 00:05 redtachyon2098

This is my strategy it is not the best but i tried.

import random

#THIS IS OMEGA JOSS!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
#Joss
#+grace period where it is just tit-for-tat
#+Agression detection
#+varing the chance of random defection based on agression
#+goes grim after "grimThreshold" defections
#+avoid double defections
#+programmable next choice (currently not used)

def strategy(history, memory):
    #vars
    TotalAgression = 0 #How many times it defected
    TotalNiceness = 0 #How many times it cooped
    choice = 1 #The choice of Omega Joss. duh.
    nextChoice = None #This overwrites the next move of Omega Joss

    #parameters (test for best value)
    chance = 0.8 #initial chance of defecting. TEST DONE
    gracePeriod = 10 #the period before any random defection. TEST DONE
    grimThreshold = 40

    if history.shape[1] < 1: #If it's the first round
        return choice, [TotalAgression, nextChoice] #Then Cooperate
    TotalAgression = memory[0] #Reading memory
    nextChoice = memory[1] #Same
    if nextChoice is not None: #Overwriteing any decision if nextChoice is not None
        choice = nextChoice #      ^
        nextChoice = None #        ^
    elif TotalAgression >= grimThreshold:
        choice = 0
    elif TotalAgression == history.shape[1]: #If it always defected then we defect
        choice = 0
    elif random.random() <= (chance - TotalAgression/(history.shape[1]) + TotalNiceness/(history.shape[1])) and history[0, -1] != 0 and history.shape[1] >= gracePeriod: 
        #Randomization and makeing sure it is not the grace period and that we avoid defecting twice in a row
        choice = 0  
    elif history[1,-1] == 0:
        choice = 0  
        TotalAgression += 1
    TotalNiceness = (history.shape[1]-gracePeriod)-TotalAgression
    if history.shape[1] == gracePeriod:
        TotalAgression = 0
        TotalNiceness = 0
    return choice, [TotalAgression, nextChoice] #Return the function. duh

Barigamb738 avatar May 28 '21 03:05 Barigamb738

The stark contrast between yours and mine....

redtachyon2098 avatar May 28 '21 03:05 redtachyon2098

This is another thing I made, it uses ML to predict the opponent's choice and brute forces the best option. But it's trash, despite being super long. Also the code is a piece of spaghetti garbage.

thinklength = 5

howahead = 3

LearnRate= 0.05

LearnIterations = 50

LearnThreshold = 1e-3

table = [[1,5],[0,6]]

import random as r

import time as t

alert = 20

alertatall = False

def activation(x):
    if(x > 0):
        return x
    else:
        return x / 1024

def clone(thing):
    if(type(thing) == list):
        a = []
        for x in range(len(thing)):
            a.append(clone(thing[x]))
        return a
    else:
        return thing

def derivative(x):
    if x > 0:
        return 1
    else:
        return 1 / 1024

class network:
    def __init__(self,nodes):
        self.nodes = []
        self.raw = []
        self.weights = []
        self.biases = []
        a = []
        self.CostValue = 1000
        for x in range(nodes[0]):
            a.append(0)
        self.nodes.append(a)
        self.raw.append(a)
        for x in range(len(nodes) - 1):
            self.nodes.append([])
            self.raw.append([])
            self.biases.append([])
            self.weights.append([])
            for y in range(nodes[x + 1]):
                self.raw[x + 1].append(0)
                self.nodes[x + 1].append(0)
                self.biases[x].append(r.random())
                self.weights[x].append([])
                for z in range(nodes[x]):
                    self.weights[x][y].append(r.random())

    def predict(self,input_list):
        self.nodes[0] = input_list
        self.raw[0] = input_list
        for x in range(len(self.biases)):
            a = []
            c = []
            for y in range(len(self.biases[x])):
                b = self.biases[x][y]
                for z in range(len(self.weights[x][y])):
                    b += self.weights[x][y][z] * self.nodes[x][z]
                a.append(activation(b))
                c.append(b)
            self.nodes[x + 1] = a
            self.raw[x + 1] = c

    def output(self):
        return self.nodes[len(self.nodes) - 1]

    def cost(self,input_list,output_list):
        self.predict(input_list)
        a = self.output()
        b = 0
        for x in range(len(a)):
            try:
                b += ((a[x] - output_list[x]) ** 2)
            except OverflowError:
                b += 16e+256
        self.CostValue = b
        return b

    def backprop(self, input_list, output_list):
        self.predict(input_list)
        w = clone(self.weights)
        b = clone(self.biases)
        expectedoutput = output_list
        for p in range(len(self.nodes) - 1):
            x = len(self.nodes) - p - 1
            differences = []
            for y in range(len(self.nodes[x])):
                differences.append(self.nodes[x][y] - expectedoutput[y])
            for y in range(len(self.nodes[x])):
                b[x - 1][y] = 2 * differences[y] * derivative(self.raw[x][y])
                for z in range(len(self.nodes[x - 1])):
                    w[x - 1][y][z] = self.nodes[x - 1][z] * 2 * differences[y] * derivative(self.raw[x][y])
            expectedoutput = []
            for y in range(len(self.nodes[x - 1])):
                a = 0
                for z in range(len(self.nodes[x])):
                    a += self.weights[x - 1][z][y] * 2 * differences[z] * derivative(self.raw[x][z])
                expectedoutput.append(((a / len(self.nodes[x])) / (-2)) + self.nodes[x - 1][y])
        return [w,b]

    def train(self,inputs,outputs,LearnRate,iterations):
        for q in range(iterations):
            total = 0
            c = self.backprop(inputs,outputs)
            avgCost = self.cost(inputs,outputs)
            for x in range(len(self.weights)):
                for y in range(len(self.weights[x])):
                    total += c[1][x][y]
                    for z in range(len(self.weights[x][y])):
                        total += c[0][x][y][z]
            if(total < 0):
                total = -total
            if(total == 0):
                total = 1e-256
            for x in range(len(self.weights)):
                for y in range(len(self.weights[x])):
                    self.biases[x][y] -= c[1][x][y] * LearnRate * (avgCost ** 0.5) / total
                    for z in range(len(self.weights[x][y])):
                        self.weights[x][y][z] -= c[0][x][y][z] * LearnRate * (avgCost ** 0.5) / total
            self.CostValue = avgCost
            if self.CostValue < 1e-10:
                break

def biggest(lis):
    big = [0]
    for x in range(len(lis) - 1):
        y = x + 1
        if(lis[y] > lis[big[0]]):
            big = [y]
        if(lis[y] == lis[big[0]] and big[0] != y):
            big.append(y)
    return big[r.randint(0,len(big) - 1)]

def shift(thing, newnum):
    a = thing
    for x in range(len(a) - 1):
        a[x] = a[x + 1]
    a[len(a) - 1] = newnum
    return a

def testseq(feed, sequence, nn):
    james = nn
    score = 0
    future = feed
    for x in range(len(sequence)):
        james.predict(future)
        adversary = 1 * (james.output()[0] > 0.5)
        future = shift(shift(future, sequence[x]), adversary)
        score += table[sequence[x]][adversary]
    return score

def tobinary(num, digits):
    number = num
    bits = []
    for q in range(digits):
        x = digits - q - 1
        if 2 ** x <= number:
            bits.append(1)
            number -= 2 ** x
        else:
            bits.append(0)
    return bits

inittime = t.time()

def strategy(history, memory):
    choice = 1
    tim = t.time()
    if type(memory) == list:
        player = memory[0]
        previousfeed = memory[1]
        wronged = memory[2]
    else:
        player = network([2 * thinklength, 10, 6, 1])
        wronged = False
    feed = []
    for x in range(2 * (thinklength - min(10, len(history[0])))):
        feed.append(0)
    for x in range(min(thinklength, len(history[0]))):
        feed.append(2 * (history[0, x + max(0, len(history[0]) - thinklength)] - 0.5))
        feed.append(2 * (history[1, x + max(0, len(history[0]) - thinklength)] - 0.5))
    if 0 in history[1]:
        wronged = True
    if type(memory) == list and wronged:
        player.predict(previousfeed)
        #if 1 * (player.output()[0] > 0.5) != history[1][len(history[0]) - 1]:
            #print(len(history[0]), 1 * (player.output()[0] > 0.5), history[1][len(history[0]) - 1], "incorrect!")
        player.train(previousfeed, [history[1][len(history[0]) - 1]], LearnRate, LearnIterations)
        #player.train(previousfeed, [history[1][len(history[0]) - 1]], LearnRate, LearnThreshold)
        options = []
        scores = []
        for x in range(2 ** howahead):
            trythis = tobinary(x, howahead)
            options.append(trythis)
            scores.append(testseq(feed, trythis, player))
        choice = options[biggest(scores)][0]
        if len(history[0]) % alert == 1 and alertatall:
            print(player.CostValue)
    if not wronged:
        choice = 1
        #if player.CostValue > 1:
        #    print(t.time() - inittime, "Crap")
    #print(t.time() - tim)
    return choice, [player, feed, wronged]

redtachyon2098 avatar May 28 '21 03:05 redtachyon2098

Wow that is impressive. I wouldn't have been able to make that.

Barigamb738 avatar May 28 '21 04:05 Barigamb738

Thank you.

redtachyon2098 avatar May 28 '21 04:05 redtachyon2098

This is my strategy it is not the best but i tried.

    TotalAgression = 0 #How many times it defected
    TotalNiceness = 0 #How many times it cooped
    ...
    TotalAgression = memory[0] #Reading memory
    nextChoice = memory[1] #Same
    ...
    return choice, [TotalAgression, nextChoice]

You only save TotalAgression to the memory. TotalNiceness will always be zero(it's set near the end of the function, but then never used). Not sure if that was what you intended to do. So this line

elif random.random() <= (chance - TotalAgression/(history.shape[1]) + TotalNiceness/(history.shape[1])) and history[0, -1] != 0 and history.shape[1] >= gracePeriod:

is exactly the same as

elif random.random() <= (chance - TotalAgression/(history.shape[1]))) and history[0, -1] != 0 and history.shape[1] >= gracePeriod:

Anyway here's my code

class Memory:
    def __init__(self):
        self.cooperateCount = 0
        self.defectCount = 0

        self.defectCountInARow = 0
        self.cooperateCountInARow = 0

        self.turnNumber = 0

        self.timesCheated = 0

    def analyseMove(self, history):
        self.turnNumber = history.shape[1]

        myLastMove = history[0, -1]
        enemyLastMove = history[1, -1]

        if enemyLastMove == 0:
            self.defectCount += 1
            self.defectCountInARow += 1
            self.cooperateCountInARow = 0
            if myLastMove == 1:
                self.timesCheated += 1
        else:
            self.cooperateCount += 1
            self.cooperateCountInARow += 1
            self.defectCountInARow = 0

    def decideMove(self, history):

        # this prevents wasting resources on always defecting enemies
        if self.defectCount / (float)(self.turnNumber) >= 0.44:
            return 0, self

        # enemy defected two times in a row he is not nice I'm going to be rude back!
        if self.defectCountInARow > 2:
            if self.defectCountInARow == 3 and self.cooperateCount != 0:  # maybe we can make up after all! Let's stop fighting!(but only if you were nice to me before)
                return 1, self
            if self.defectCountInARow == 4 and self.cooperateCount != 0:
                return 1, self
            return 0, self

        return self.defaultUpgradedTitForTat(history), self

    def defaultUpgradedTitForTat(self, history):
        myLastMove = history[0, -1]
        enemyLastMove = history[1, -1]

        if enemyLastMove == 0:  # enemy was rude
            if myLastMove == 1:  # i've been nice and enemy has been rude! I will be rude now as well!
                if self.turnNumber >= 2 and history[1, -2] and self.timesCheated < 3:  # he was nice before? let's be nice! (but only 2 chances!)
                    return 1
                return 0
            else:  # i've been rude and enemy has been rude! I guess we should try to make up
                return 1
        else:
            return 1


def strategy(history, memory):
    if memory is None:  # cooperate on the first turn
        return 1, Memory()
    else:
        memory.analyseMove(history)
        return memory.decideMove(history)

It's basically a quite forgiving titForTat that gives the enemy multiple chances to make up, but can also work as grim if enemy defects too many times.

SebastianKorytkowski avatar May 28 '21 14:05 SebastianKorytkowski

It seems like titForTat and grimTrigger variants are very common (which makes sense). Anyway, since you are posting the source code of your strategies, I figured I might as well post mine (for testing purposes or whatever).

def strategy(history, memory):         #---STRATEGY NAME: revJossFGT6 -> reversed Joss + forgiving grimTrigger
    current_round = history.shape[1]
    choice = None

    #---MEMORY ARRAY FORMAT: [current mode, opponent T's counter, variable, betrayed_counter allower, forgiveness factor]---#
    if memory == None:
        memory = ['', -1, 0, True, 2]

    if current_round < 6: #---BUILDING TRUST + GETTING DATA (for an optimal number of rounds based on testing)
        memory[0] = 'cooperation'
    elif memory[3]:
        if list(history[1]).count(0) == 0:
            memory[0] = 'cooperation'
            memory[1] = 0
        else:
            memory[0] = 'forgiving-grimTrigger'    #---Randomness/aggressiveness identified
            memory[1] = list(history[1]).count(0)
            
        memory[3] = False

    #---COOPERATION MODE (revJoss - reversed Joss)---#
    if memory[0] == 'cooperation':
        choice = 1
        if (current_round >= 1 and history[1,-1] == 0) and (list(history[1,-3:]) != [0,0,0]):
            choice = 0

    #---ATTACKING MODE (FGT - forgiving grimTrigger)---#
    else:
        if type(memory[2]) == int:
            memory[2] = memory[2] + 1
            if memory[2] <= memory[1]:
                choice = 0 #---ATTACK! (proportional to opponent's aggression)
            elif memory[2] <= memory[1] + memory[4]:
                choice = 1 #---INVITATION (for memory[4] rounds)
            elif (history[1, -1] == 1 or history[1, -2] == 1) and memory[4] > 0: #---INVITATION ACCEPTED
                memory[2] = 'play_1_until_betrayal'
                memory[4] = min(memory[4] + 1, 2) #---COOPERATE MORE
            else:
                memory[2] = 0 #---INVITATION DENIED
                memory[4] = memory[4] - 1 #---COOPERATE LESS
                choice = 0
        
        if memory[2] == 'play_1_until_betrayal':
            choice = 1
            if history[1, -1] == 0: #---BETRAYED!
                memory[2] = 0
                choice = 0

    return choice, memory

vlcanesin avatar May 28 '21 14:05 vlcanesin

A lot of people are posting their strategies! I really hope there will be another one of these soon.

redtachyon2098 avatar May 28 '21 15:05 redtachyon2098

That's a great idea! Here's my strategy in just 9 LOC, called "weightedMonotonic" (also here):

# Weights for opponents' past moves, starting with the most recent one
weights = [1, -0.8, 0.8, -0.8, 0.8, -0.8, 0.8, -0.8, 0.8]

def strategy(history, memory):
    round_index = history.shape[1]
    opponent_history = history[1]
    
    acc = 0
    for (i, w) in enumerate(weights):
        if i < round_index:
            # Weigh opponents' i-to-last move (-1 if they defected, +1 if they cooperated)
            acc += w * (opponent_history[-i - 1] * 2 - 1)
    
    # Cooperate if weighted sum is at least 0. It's important to cooperate in the first round (acc=0) as to not anger
    # the Grim Trigger
    return 1 if acc >= 0 else 0, None

I admit that it doesn't really conform to the "pure" prisoner's dilemma tournament challenge as it doesn't even try to figure out the opponent strategy and relies neither on smart conditionals nor on memory. But it works exceptionally well in my test runs against a pool of various strategies by l4vr0v, valadaptive and redtachyon2098 (huge respect and thanks to you!). It's consistently in first place :) I'm currently running it against the Prisoners-Dilemma-Enjoyers pool and I'll update you when it's done.

It's basically just a weighted sum over the opponent's past 9 moves and then a threshold for deciding whether to cooperate or defect. The idea came from FIR filters but I didn't apply any signal processing or FIR theory except for the way the calculation is done. I planned to optimize the weights using a genetic algorithm but sadly didn't have the time. The results I got using just my intuition and a bit of experimentation were already way better than I expected.

I noticed that it gets a lot worse if there are many random strategies in the pool, so I hope not too many people were funny and entered one... It does way better against titForTat and grimTrigger variants which I expect there'll be a lot of.

maxkl avatar May 28 '21 22:05 maxkl

Wow! I never thought of that! The optimal weights might change depending on what types of strategies there are, so it won't be entirely consistent, but if you already know which strategies you are playing against, it would perform well.

redtachyon2098 avatar May 28 '21 23:05 redtachyon2098

Man, this is like ML... but not ML...

ThatXliner avatar May 28 '21 23:05 ThatXliner

Hey! I've got an idea! You should make an evolution simulator for refining the weights of your algorithm to make the ultimate strategy!

redtachyon2098 avatar May 28 '21 23:05 redtachyon2098

Wow! I never thought of that! The optimal weights might change depending on what types of strategies there are, so it won't be entirely consistent, but if you already know which strategies you are playing against, it would perform well.

You're probably right. It ends up in 20th place in the Prisoners-Dilemma-Enjoyers pool, even behind some that were in my original test pool. Oh man, this challenge isn't easy (d'oh!)

maxkl avatar May 28 '21 23:05 maxkl

Uh-oh.

redtachyon2098 avatar May 28 '21 23:05 redtachyon2098

Yay, I can finally share my strategy! This is diplomat.py, and places 4th in the strategies currently in the "prisoner's dilemma enjoyers" repo. (But I'm sure there's quite a few more that could give it a run for its money!)

# OTFT with a bunch of tweaks
#
# memory[0]: deadlock counter
# memory[1]: exploiter counter
# memory[2]: # of backlogged Cooperations
# memory[3]: opponent non-blockade Coops
# memory[4]: non-blockade rounds
# memory[5]: opponent blockade Coops
# memory[6]: blockade rounds
# memory[7]: "isBlockade" boolean

import random

def strategy(history, memory):
    deadlockThreshold = 3
    exploiterThreshold = 8

    if memory == None:
        memory = [0,0,0,0,0,0,0,False]
        choice = "cooperate"
        return choice, memory

    # update occurrence counters
    if not memory[7]: # non-blockade round
        memory[4] += 1
        if history[1,-1] == 1:
            memory[3] += 1
    else: # blockade round
        memory[6] += 1
        if history[1,-1] == 1:
            memory[5] += 1
        if memory[6] == 10: # after 10 turns, reevaluate blockade
            if memory[5] == 0 and memory[3] > 0:
                memory[1] = 0 # end blockade
               memory[7] = False
               memory[2] = 2 # double Coop to end blockade

    if memory[2] != 0: # consider backlogged Coops
        choice = "cooperate"
        memory[2] -= 1

    # check if a deadlock is detected.
    elif memory[0] >= deadlockThreshold:
        # stay silent twice as a truce attempt
        choice = "cooperate"

        if memory[0] == deadlockThreshold:
            memory[0] += 1
        else:
            memory[0] = 0
    else:
        if history.shape[1] >= 2 and history.shape[1] <= 20:
            # modify the exploiter counter
            if history[1,-1] == 1 and history[1,-2] == 1:
                # opponent cooperates twice; decrement
                memory[1] -= 1
            if history[1,-1] != history[1,-2]:
                # opponent changes move; increment
                memory[1] += 1
            if history[1,-1] != history[0,-2]:
                # opponent moves in a non-TFT fashion; increment
                memory[1] += 1

        if memory[1] >= exploiterThreshold:
            # tell truth indefinitely
            # (exploiter counter will never fall below threshold)
            choice = "defect"
            memory[7] = True
        else:
            if history[1,-1] == 0: # tit-for-tat
                choice = "defect"
            else:
                choice = "cooperate"

            # update deadlock counter
            if history.shape[1] >= 2 and history[1,-1] != history[1,-2]:
                memory[0] += 1
            else:
                memory[0] = 0

    if history.shape[1] >= 1 and history[0,-1] == 0 and history[1,-1] == 0 \
       and choice == "defect" and memory[3] > 0:
        # attempt to break double defection with double coop
        if memory[1] < exploiterThreshold and random.random() <= 0.1:
            choice = "cooperate"
            memory[2] = 1

    return choice, memory

Diplomat builds off of an extremely strong strategy called "Omega Tit-for-Tat". OTFT makes a number of improvements to regular tit-for-tat, and with diplomat I've tried to make even more. Here's a list of what diplomat adds to tit-for-tat (henceforth TFT):

  1. (from Omega) Diplomat implements random detection (which, given carykh's shoutout to that strategy in the video, I'm guessing is very important) with a randomness (or "exploiter") counter. Rather than trying to detect statistical randomness, it detects illogical movements, which also helps to lump alwaysDefect into the "random" category. After exceeding the randomness threshold, Diplomat defects indefinitely. (Sort of...)
  2. (from me) 10 turns after lumping an opponent into the "random" category and defecting indefinitely (henceforth a "blockade"), Diplomat reevaluates its decision. While I experimented with opponent backstab and non-forgiveness ratios, at the end of the day, the most effective way to determine that an opponent is not either random or alwaysDefect is to see if it made any cooperative moves during the blockade. If it made cooperative moves before the blockade, but not during the blockade, Diplomat lifts the blockade and offers a "truce" by cooperating twice in a row.
  3. (from me) Diplomat stops attempting to detect random or alwaysDefect after turn 20. I have yet to see it fail to detect random before turn 15, and this helps it to not misidentify opponent strategies that are exploitative but nevertheless worth cooperating with.
  4. (from Omega) Diplomat implements deadlock detection. A deadlock is when both opponents alternate between Cooperating and Defecting. You may notice that if two TFT-based algorithms are facing each other, and one Defects for whatever reason, both will continue cycling between Defect and Cooperate indefinitely. Diplomat detects deadlocks with a deadlock counter, which increments every time the opponent changes their move and resets to zero every time the opponent makes the same move twice. When a deadlock is detected, Diplomat cooperates twice in a row to attempt to restore a chain of Cooperations.
  5. (from me) Another fault of TFT is that if both strategies ever Defect in a single turn, both strategies will Defect until the end of the game, even though they could earn more points by restoring the peace and reestablishing a Cooperation chain. If both Diplomat and the opponent strategy defected last round, and the opponent is known not to be random or alwaysDefect, there is a 10% chance that Diplomat will attempt to break the Defection chain by Cooperating twice (this percentage was chosen by lots of tweaking in the prisoner's dilemma enjoyers repo; there's no mathematical significance to it).

If all of the above has been considered, and Diplomat still does not have a move to make, it will act as pure tit-for-tat. I experimented with a couple other techniques: Josh recommended starting the randomness counter much higher if the opponent starts by Defecting, since most logical opponents will start by Cooperating. I was unable to find success with this, perhaps because it gives strategies that start out mean but are able to play nice later on less of a chance to redeem themselves. Also, I considered disabling the "forgiveness" feature from step 5 during the first X turns so that no detective strategies got the wrong idea. This likely ran into the same problem as before. Additionally, it did not account for "late detectives", which are smart and likely to take up some portion of the final meta.

I had so much fun with this! I don't expect this strategy to win, but hopefully it can crack the top 10. We should do something again like this soon. Perhaps the Noisy Iterated Prisoner's Dilemma next??

Edit: github formatting, smh

Xapakas avatar May 28 '21 23:05 Xapakas