A simple learning strategy which realizes robust cooperation better than Pavlov in Iterated Prisoners' Dilemma

Authors: Joe Yuichiro Wakano and Norio Yamamura

Center for Ecological Research, Kyoto University, Kamitanakami Hiranocho, Otsu, Shiga, 520-2113, Japan)

Pavlov was proposed as a leading strategy for realizing cooperation because it dominates over a long period in evolutionary computer simulations of the Iterated Prisoners' Dilemma. However, our numerical calculations reveal that Pavlov and also any other cooperative strategy are not evolutionarily stable among all stochastic strategies with memory of only one previous move. We propose simple learning based on reinforcement. The learner changes its internal state, depending on an evaluation of whether the score in the previous round is larger than a critical value (aspiration level), which is genetically fixed. The current internal state decides the learner's move but we found that the aspiration level determines its final behavior. The cooperative variant, having an intermediate aspiration level, is not an ESS when evaluation is binary (good or bad). However, when the evaluation is quantified some cooperative variants can invade not only All-C, TFT and Pavlov but also non-cooperative variants with different aspiration levels. Moreover, they establish robust cooperation which is evolutionarily stable against invasion by All-C, All-D, TFT, Pavlov and non-cooperative variants, and they receive a high score even when the error rate is high. Our result suggests that mutual cooperation can be maintained when players have a primitive learning ability.