game-theory-when-smarter-be-loser

由于人类智力存在显着差异,人们并不都比实际聪明得多。除了众所周知原因,高智商=高成本(更大的脑袋,意味着更多的能源消耗,同时导致分娩更困难),所以,人类的聪明是存在一定物理条件限制的。在一些非零和游戏中,聪明还可能成为一个劣势。

下面是一个例子:

考虑一个有 2n 名玩家的无限重复游戏,在每一轮中,所有玩家都在 n 个单独的囚徒困境阶段游戏中随机匹配。每轮结束后,结果都会被记录并公布。

这个游戏的一个合理结果是每个人都遵循这个策略(我们称之为 A):

策略A

最初将所有玩家标记为“好人”。
如果有人背叛被标记为“好人”的玩家,将其标记为“坏人”。
对“好人”玩家采取“合作”策略,对“坏人”玩家采取“欺骗”策略。

假设在每个阶段的博弈中,结果有概率 p 不公开。同时,假设 n 足够大,可以忽略两个玩家将来再次面对对方的可能性,并记住之前未公布的结果。现在,根据概率 p、贴现系数和实际收益,对于每个玩家来说,遵循策略 A 仍然是一个最佳策略(纳什均衡)。

例如,假设收益为 2,2/3,-10/-10,3/0,0 和 p=0.5。
如果一个玩家偏离了上述策略,对一个“好人”玩家采取“欺骗”,那么他在当前回合获得1个效用增加(与策略A相比),但在未来的每一回合中,失去2个效用的概率为0.5。

现在进一步假设,用于决定每个结果是否发布的随机数生成器是伪随机的,并且有一些“聪明”的玩家能够识别模式并预测给定阶段游戏的结果是否会发布。
假设这些“聪明”的玩家是谁是公众所知的。
在每个回合的循环比赛中,每个人都不再遵循策略A(偏离纳什均衡)。因为一个“聪明”的玩家应该总是在他预测结果不会公布的回合中玩“欺骗”。“普通”玩家可以遵循策略A,也可以遵循修改后的策略B:

策略B

首先将所有“聪明”玩家标记为“坏人”,在这种情况下,“聪明”玩家也应该首先将所有“普通”玩家标记为“坏人”。

在任何一种情况下,总盈余都比没有“聪明”的参与者少。但对于某些游戏参数,只有后者是纳什均衡的,在这种情况下,“聪明”的玩家实际上最终会比“普通”玩家更糟糕。(请注意,即使第一个结果是平等的,它也不是联盟证明的。也就是说,“普通”玩家有动机集体转向策略B。

例如,再次考虑上述收益。当一个“普通”玩家面对一个“聪明”玩家时,他知道“聪明”玩家叛逃的概率为0.5。如果他偏离策略 A 玩“欺骗”,则他有 0.5 的概率获得 10 个效用,0.5 的概率他在当前回合中获得 1 个额外的效用,并且在未来的每一轮中损失不超过 2 个效用。因此,根据时间折扣因素,他很可能会采用“欺骗”的动机。

以上博弈论故事来源于:http://www.weidai.com/smart-losers.txt
原文标题:a game where the smarter players lose.


英文原文

Given that there is significant existing variation in human intelligence, it’s curious that we are not all much smarter than we actually are. Besides the well-known costs of higher intelligence (e.g., more energy use, bigger heads causing more difficult births), it seems that being smart can be a disadvantage when playing some non-zero-sum games. Here is one example.

Consider an infinitely repeated game with 2n players, where in each round all players are randomly matched against each other in n seperate prisoner’s dillema stage games. After each round is finished, the outcomes are recorded and published.

One plausible outcome of this game is for everyone to follow this strategy (let’s call it A): Initially mark all players as “good”. If anyone defects against a player who is marked as “good”, mark him as “bad”. Play “cooperate” against “good” players, “defect” against “bad” players.

Now suppose in each stage game, there is probability p that the outcome is not made public. Also assume that n is large enough so that we can disregard the possibility that two players might face each other again in the future and remember a previous non-published outcome. Now depending on p, the discount factor, and the actual payoffs, it can still be an equilibrium for everyone to follow strategy A.

For example, suppose the payoffs are 2,2 / 3,-10 / -10,3 / 0,0, and p=0.5. If a player deviates from the above strategy and plays “defect” against a “good” player, he gains 1 utility (compared to strategy A) for the current round, but has a probability of 0.5 of losing 2 utility in each future round.

Now further suppose that the random number generator used to decide whether each outcome is published or not is only pseudorandom, and there are some “smart” players who are able to recognize the pattern and predict whether a given stage game’s outcome will be published. And suppose it’s public knowledge who these “smart” players are. In this third game, its no longer an equilibrium for everyone to follow strategy A, because a “smart” player should always play “defect” in any round in which he predicts the outcome won’t be published. The “normal” players can follow strategy A, or they can follow a modified strategy (B) which starts by marking all “smart” players as “bad”, in which case the “smart” players should also start by marking all “normal” players as “bad”.

In either case the total surplus is less than if there were no “smart” players. But with some game parameters, only the latter is an equilibria, in which case “smart” players actually end up worse off than “normal” players. (Note that even when the first outcome is an equlibrium, it is not coalition-proof. I.e., the “normal” players have an incentive to collectively switch to strategy B.)

For example, consider the above payoffs again. When a “normal” player faces a “smart” player, he knows there is 0.5 probability that the “smart” player will defect. If he deviates from strategy A to play “defect”, there is 0.5 probability that he gains 10 utility, and 0.5 probability that he gains 1 utility in the current round and loses no more than 2 utilities in each future round. Therefore depending on the time discount factor he may have an incentive to play “defect”.