Friday, August 6, 2010

The Wrath of Blotto

You may remember when I invited everyone to play my webform version of Colonel Blotto. Well, its still up and has been up for some time, but hasn't seen any action for a while so I thought it might be time to take a look at the results.

Colonel Blotto is an interesting game. It seems to me, that much of this interest derives from the fact that how well your strategy performs is very much a function of which strategies exist in the pool. There is not a clear cut winning strategy, you need to feel out the existing pool and adapt accordingly.

So to stir things up a little bit, in what follows I will share some data from the existing database, refraining myself from commenting too much. Basically, stay tuned for a bunch of pretty pictures which will hopefully get your gears turning. The game is still up, feel free to try to game it now that this information is out. Might be interesting to see what kind of effect releasing the leaderboard will have on the leaderboard.

Leader Board

347 Strategies were submitted since the game went live. Lets try and take a look at what kind of strategies were submitted. Below are the top 25 ranking strategies in the database as of yesterday, along with the actual strategy, its points, and full record.


Soldier Distribution

Next, lets try and take a look at all the strategies at once. Lets start with the soldier distributions among the different slots. I will remind you that the rules of the game are slot independent, i.e. if machines were trying to play this game against one another you would expect the soldier distribution to be more or less uniform between slots, any deviation from uniformity probably says something deep and profound about how humans think.

Above is a box and whiskers plot of all strategies, looking at the number of soldiers in each slot.

This plot is fun. It shows the full distribution of all of the strategies. I went through the database, and for every strategy, added one to the box that held true. I.e. for each slot (the slots are along the x axis), the y axis marks how many strategies in the database have that many soldiers in that slot. Should be fun to try and think about what the non-uniformities mean. Colorbar is on the side to make the colors quantitative.

Point Distribution

So, how well do all of the strategies do... lets take a look.

Above is a histogram of all the scores for all the strategies in the database. It has some interesting features. Definitely not singly peaked. What do you think is going on on the far left?

In this plot, I again went through all of the strategies in the database, and this time, every square reflects the average score for all strategies that have that many soldiers (y axis) in that slot (x axis).

Strategies Layout

Lets drill down a bit and look at how each strategy performs.
This scatter plot has a point for each strategy in the database, the x coordinate giving its number of wins, the y coordinate its number of losses. The area of the circle is scaled to its number of ties, and the color is scaled to its total score.

Is there a clear trend? Why does it fan out?

Similar plot as above but this time, x coordinate is Wins, Y is ties, size is losses and color score.

One more, this time x coordinate is losses, y is ties, color is score.

Fitness Landscapes

So, what should you do if you wanted to design a winning strategy? Lets first take a look at the fitness landscape.

Now, this is difficult to do for the whole game, with 10 slots and something like 40 reasonable choices for each, this is some huge 10 dimensional space, which is difficult to visualize, so instead lets take a look at the fitness landscape for some lower dimensional cuts.

So in the above plot, what I've done is constructed a whole bunch of strategies. First I started with 8 soldiers in all slots but the ones listed in the title, namely slots 4,5,6 (starting with 1). So with only 3 slots free, and with the constraint that we have to have 100 soldiers total, I can parametrize a whole bunch of strategies with only two numbers, in this case the number of soldiers in slot 5 (x axis) and slot 6 (y axis). The color represents the score that the resulting strategy has when run against all of the previously existing strategies in the database. This was created without adding all of these strategies to the database itself as that would have changed the results.

Similar plot, this time changing the soldiers in slots 5,6 and 8 with x axis the number of soldiers in slot 6 and the y axis the number of soldiers in slot 8.

Another one, hopefully my labels make enough sense now that I don't have to spell it out. I think this one has an interesting shape. What's going on?

One more.

Random Strategies

So, lets say you are trying to construct winning strategies. The first thing you might try to do is construct random strategies, by randomly dropping 100 soldiers each into one of the slots at random. Doing so and running these strategies against the database I got an idea of how effective that would be.

Above is a histogram of the random strategies' scores. Not so good. Looks like humans playing the game are better than randomly guessing.

Best Strategy?

So lets say you wanted to make the best strategy, what could you look at? Well, for starters you might be interested in a question like the following? 'If I put N soldiers in slot X, what percentage of the existing strategies in the database would I beat in that slot?" The next graph answers this question.

Here I have attempted to show for every X,Y coordinate the following. With Y soldiers in slot X, what percentage of the existing strategies do you beat in slot X. I changed the color scaling to make it more refined, so you can better read it quantitatively.

Go Forth

So, there you have it. You are not fueled with what is probably way more information than you were hoping for. Hopefully these graphs are more than just pretty, and get you thinking a bit. That is a lot of what science is about. Make some observations and then attempt to explain the results with your own models. You can then test your models with experiment. I've provided you with a bunch of observations and invited you to construct your own explinations. Now I invite you to perform an experiment. Think you know whats going on in the game? Then try and beat it. The link to play is the same as before.


  1. Unusually for this blog the analysis is completely off. The values are correlated in that their total is fixed and an analysis of each value in isolation gives little information (after getting the 1st, 2nd and third ranking {lohilohi_slide, etc} I tried to use the various heap maps and dropped to 138 (heat_map).

    I think the relative change in value between each box is the key information. My first attempt averaged the box values appearing in the top 25 scores and that got a 62nd place (aver_p6). Then I spotted that several distribution patterns existed, I picked what looked like the most promosing and averaged over it to get to number 1 (what is the maximum score that number 1 could achieve?).

  2. Hey Derek. I'm confused as to how the analysis could be completely off considering I didn't really do any analysis. The plots are just data taken from the existing database. I actually held my tongue from discussing the implications in the graphs and attempted to leave the analysis to the viewers.

    Theoretically, the maximum score attainable in some sense is just 2 times the total number of strategies in the database, assuming you could beat all of them, but certainly this is an impossibility. Other than that I have no idea, seeing as I didn't compute the full Energy surface.

    I can tell you however that I messed around a bit, trying to find the best strategy using simulated annealing, and population annealing with some success, but didn't submit those strategies online, because I didn't want to be a dick (I'm the only one that should be able to see the whole dataset, assuming no one has hacked it).

    If any of the graphs are unclear as to what they are showing, be sure to let me know and I can try to correct their description.

    I can do one better, if you have any other ideas for simply generated graphs that would be of use to you, I could probably do that for you. Good luck. Sorry for any confusion.

  3. Alemi, Thanks for the prompt reply.

    An analysis is implied by the measurements you make. After posting I realised we are interested in different measurements, you are looking at averages and I want information to help select a better strategy. heat_map aimed to do well against the average and is ranked 138, a slightly better than average score. The top 25 are by definition not average.

    This is really a problem in psychology; what sequences would most other people choose? Most readers of this blog likely come from a weird culture (Western, Educated, Industrialized, Rich and Democratic) which down narrows the possible ways of thinking to those I am familiar with, but there are still enough combinations to make things uninteresting. People could start with low values, go high and then back to low, or they could do the reverse or alternate low/high between bins, etc. The initial conditions (i.e., first x sequences submitted) will of course have had some impact, although I'm not sure what.

    An analysis of common low/high/same patterns could be interesting, as could an analysis of changes in the patterns appearing in answers given by the same person as they obtained information from previous answers.

    I tried one sequence (the_well) with a middle bin containing 0 (the choice based on heat map data and intended to get the most from a 'suicide') and it ranked 47. This suggests to me that winners need to do well over all bins (of course I may have chosen the wrong bin to zero but it beats lohilohi_slide in the number 1 slot).

    Thanks for the offer of generating graphs that would be of use to me. What would be of use to me? Knowing I was close to the maximum score attainable with the answers so far received would be useful to me but not you (I take it would like people to send in lots of answers and not stop). I think information low/high sequence occurrence might be useful, it depends on whether there are a few or lots of different patterns.

    How many different sequences would I have to give before it was possible to deduce that I had achieved the maximum score obtainable with a given set of sequences (assuming that none of my sequences gave the maximum points possible)?

  4. Hey Derek,

    Answering your questions is difficult. I made the graphs I did because they were relatively cheap to compute. Finding the best strategy, or figuring out even the maximal attainable strategy, I believe would require computing the entire energy landscape, i.e. what the score would be for every possible choice in strategy. Representing this score as an int, with 100 choose 10 possible strategies, 4 bytes per int, would require 70 Terabytes. Of course, this would have to be recomputed for much of the landscape for each additional strategy. Furthermore, I wholly expect that this landscape is highly non-convex. I.e. I expect that there are a lot of local minima which makes any local search ineffective, and any global search hard.

    I will think about how to calculate the appearance of different (psychological) strategies. Not sure how I would do that, maybe some kind of fancy neural network kind of categorization thingamagig. I'll think about, sounds like some work though.

    Looks like you are doing well at the game as it is though. But as more people submit strategies, I suspect the top 25 to get shuffled quite a bit.

  5. Hi Derek,

    I read through some of your analysis, and I just wanted to point out that you're over-counting the number of possible configurations by a lot, basically because you don't take into account the fact that the soldiers are indistinguishable from each other. I think about it like this: in each field you can put between 0 and 100 soldiers, subject to the constraint that all soldiers in all fields add up to 100. Ignoring the constraint for now, you get 101^10 ~ 10^20 possible configurations, the vast majority of which are invalid.

    Applying the constraint probably requires some careful thinking, but some quick MC simulations suggest that ~ 5/10^8 of the remaining strategies are valid, which you can probably reproduce more analytically by using the approximation that the sum of 10 uniformly distributed numbers is a ~ normal distribution with mean ~500 and standard deviation ~91.

    This leaves you with about 5*10^12 << 10^100 valid strategies.

    Another point: the data suggests (unsurprisingly) that some people attempted to find the worst possible strategy, using something like (100,0,0...). I believe that the strategy you dismiss as always losing, (25,25,25,25,0..0) will beat a fair number of possible strategies, including some actually in the field.

  6. Hi Forbes,

    Oops, a rather fundamental mistake on my part.

    I have updated the post to list the number of valid strategies as 4263421511271 and 635627275767 if every field contains at least two soldiers.

    I had noticed that some players were attempting to loose to everybody. The number of such attempts looked small (assuming some competence at losing) and I ignored them.

  7. With the current game I have tried various strategies with slight variations as a "measurement" to get a feeling for the existing pool. But you cannot measure without changing the state of the pool. Interesting analogy there with quantum mechanics...
    There is another way to play this game. Given all "bets" in an existing pool, what is the strategy with the minimum number of soldiers that still reaches the number 1 position? This takes the guessing out of the game, my feeling is that it should not be too hard to calculate a winning strategy. Maybe it is an idea to reveal the current state of the pool as a side experiment.