Sunday, January 20, 2013

Re-evaluating the values of the tiles in Scrabble™



Recently I have seen quite a few blog posts written about re-evaluating the points values assigned to the different letter tiles in the Scrabble™ brand Crossword Game.  The premise behind these posts is that the creator and designer of the game assigned point values to the different tiles according to their relative frequencies of occurrence in words in English text, supplemented by information gathered while playtesting the game.  The points assigned to different letters reflected how difficult it was to play those letters: common letters like E, A, and R were assigned 1 point, while rarer letters like J and Q were assigned 8 and 10 points, respectively.  These point values were based on the English lexicon of the late 1930’s.  Now, some 70 years later, that lexicon has changed considerably, having gained many new words (e.g.: EMAIL) and lost a few old ones.  So, if one were to repeat the analysis of the game designer in the present day, would one come to different conclusions regarding how points should be assigned to various letters?

I’ve decided to add my own analysis to the recent development because I have found most of the other blog posts to be unsatisfactory for a variety of reasons.*  One article calculated letters’ relative frequencies by counting the number of times each letter appeared in each word in the Scrabble™ dictionary.  But this analysis is faulty, since it ignores the probability with which different words actually appear in the game.  One is far less likely to draw QI than AE during a Scrabble™ game (since there’s only one Q in the bag, but many A's and E's).  Similarly, very long words like ZOOGEOGRAPHICAL have a vanishingly small probability of appearing in the game: the A’s in the long words and the A’s in the short words cannot be treated equally.  A second article I saw calculated letter frequencies based on their occurrence in the Scrabble™ dictionary and did attempt to weight frequencies based on word length.  The author of this second article also claimed to have quantified the extent to which a letter could “fit well” with the other tiles given to a player.  Unfortunately, some of the steps in the analysis of this second article were only vaguely explained, so it isn’t clear how one could replicate the article’s conclusions.  In addition, as far as I can tell, neither of these articles explicitly included the distribution of letters (how many A’s, how many B’s, etc) included in a Scrabble™ game.  Also, neither of these articles accounted for the fact that there are blank tiles (that act as wild cards and can stand in for any letter) that appear in the game.

So, what does one need to do to improve upon the analyses already performed?  We’re given the Scrabble™ dictionary and bag of 100 tiles with a set distribution, and we’re going to try to determine what a good pointing system would be for each letter in the alphabet.  We’re also armed with the knowledge that each player is given 7 letters at a time in the game, making words longer than 8 letters very rare indeed.  Let’s say for the sake of simplicity that words 9 letters long or shorter account for the vast majority of words that are possible to play in a normal game. 

Based on these constraints, how can one best decide what points to assign the different tiles?  As stated above, the game is designed to reward players for playing words that include letters that are more difficult to use.  So, what makes an easy letter easy, and what makes a difficult letter difficult?  Sure, the number of times the letter appears in the dictionary is important, but this does not account for whether or not, on a given rack of tiles (a rack of tiles is to Scrabble™ as a hand of cards is to poker), that letter actually can be used.  The letter needs to combine with other tiles available either on the rack or on the board in order to form words.  The letter Q is difficult to play not only because it is used relatively few times in the dictionary, but also because the majority of Q-words require the player to use the letter U in conjunction with it.

So, what criterion can one use to say how useful a particular tile is?  Let’s say that letters that are useful have more potential to be used in the game: they provide more options for the players who draw them.  Given a rack of tiles, one can generate a list of all of the words that are possible for the player to play.  Then, one can count the number of times that each letter appears in that list.  Useful letters, by this criterion, will combine more readily with other letters to form words and so appear more often in the list than un-useful letters.

(I would also like to take a moment to preempt criticism from the competitive Scrabble™ community by saying that strategic decisions made by the players need not be brought into consideration here.  The point values of tiles are an engineering constraint of the game.  Strategic decisions are made by the players, given the engineering constraints of the game.  Words that are “available to be played” are different from “words that actually do get played.”  The potential usefulness of individual letter tiles should reflect whether or not it is even possible to play them, not whether or not a player decides that using a particular group of tiles constitutes an optimal move.)

To give an example, suppose I draw the rack BEHIWXY.  I can generate** the full list of words available to be played given this rack: BE, BEY, BI, BY, BYE, EH, EX, HE, HEW, HEX, HEY, HI, HIE, IBEX, WE, WEB, WHEY, WHY, WYE, XI, YE, YEH, YEW.  Counting the number of occurrences of each letter, I see that the letter E appears 18 times, while the letter W only appears 7 times.  This example tells me that the letter E is probably much more potentially useful than the letter W.

The example above is only one of the many, many possible racks that one can see in a game of Scrabble™.  I can use a Monte Carlo-type simulation to estimate the average usefulness of the different letters by drawing many example racks.  Monte Carlo is a technique used to estimate numerical properties of complicated things without explicit calculation.  For example, suppose I want to know the probability of drawing a straight flush in poker.***  I can calculate that probability explicitly by using combinatorics, or I can use a Monte Carlo method to deal a large number of hypothetical possible poker hands and count the number of straight flushes that appear.  If I deal a large enough number of hands, the fraction of hands that are straight flushes will converge upon the correct analytic value.  Similarly here, instead of explicitly calculating the usefulness of each letter, I use Monte Carlo to draw a large number of hypothetical racks and use them to count the number of times each letter can be used.  Comparing the number of times that each tile is used over many, many possible racks will give a good approximation of how relatively useful each tile is on average.  Note that this process accounts for the words acceptable in the Scrabble™ dictionary, the number of available tiles in the bag, as well as the probability of any given word appearing.

In my simulation, I draw 10,000,000 racks, each with 9 tiles (representing the 7 letters the player actually draws plus two tiles available to be played through to form longer words).  I perform the calculation two different ways: once with a 98-tile pool with no blanks, and once with a 100-tile pool that does include blanks.  In the latter case, I make sure to not count the blanks used to stand in for different letters as instances of those letters appearing in the game.  The results are summarized in the table below.
There are two key observations to be made here.  First, it does not seem to matter whether or not there are blanks in the bag!  The results are very similar in both cases.  Second, it would be completely reasonable to keep the tile point values as they are.  Only the Z, H, and U appear out of order.  It’s only if one looks very carefully at the differences between the usefulness of these different tiles that one might reasonably justify re-pointing the different letters. 

For fun, I have included in the table my own suggestions for what these tiles’ values might be changed to based on the simulation results.  (Note: here's where any pretensions of scientific rigor go out the window.)  I have kept the scale of points between 1 and 10, as in the current pointing system.  I have assigned groups of letters the same number of points based on whether they have a similar usefulness score.

Here are the significant changes:  L and U, which are significantly less useful than the other 1-point tiles may be bumped up to 2 points, comparable to the D and G.  The letter V is clearly less useful than any of the other three 4-point tiles (W, Y, and F, all of which may be used to form 2-letter words while the V forms no 2-letter words), and so is undervalued.  The H is comparable to the 3-point tiles, and so is currently overvalued.  Similarly, the Z is overvalued when one considers how close to the J it is.  Unlike in the previous two articles that I mentioned, I don't find any strong reason to change the value of the letter X compared to the other 8 point tiles.  I suppose one could lower its value from 8 points to 7, but I have (somewhat arbitrarily) chosen not to do so.

One may also ask the question whether or not the fact that a letter forms 2- or 3-letter words is unfairly biasing that letter.  In particular, is the low usefulness of the C and V compared to comparably-pointed tiles due to the fact that they form no 2-letter words?  Performing the simulation again without 2-letter words, I found no changes in the results in any of the letters except for C, which increased in usefulness above the B and the H.  The letter V's ranking, however, did not change at all, indicating that unlike the C the V is difficult to use even when combining with letters to make longer words.  Repeating the simulation yet again without 2- or 3-letter words yielded the same results.

As a final note, I would like to respond directly to to Stefan Fatsis's excellent article about the so-called controversy surrounding re-calculating tile values and say that I am fully aware that this is indeed a "statistical exercise," motivated mostly by my desire to do the calculation made by others in a way that made sense in the context of the game of Scrabble.  Similarly, I realize that these recommendations are unlikely to actually change anything.  Given that the original points values of the tiles are still justifiably appropriate by my analysis, it's not like anybody at Hasbro is going to jump to "fix" the game.  Lastly, my calculations have nothing to do with the strategy of the game whatsoever, and cannot be used to learn how to play the game any better.  (If anything, I've only confirmed some things that many experienced Scrabble players already know about the game, such as that the V is a tricky tile, or that the H, X, and Z tiles, in spite of their high point values, are quite flexible.)



* To state my own credentials, I have played Scrabble™competitively for 4 years, and am quite familiar with the mechanics of the game, as well as contemporary strategy.

** Credit where credit is due: Alemi provided the code used to generate the list of available words given any set of tiles.  Thanks Alemi!

*** Monte Carlo has a long history of being used to estimate the properties of games.  As recounted by George Dyson in Turing’s Cathedral, in 1948 while at Los Alamos the mathematician Stanislaw Ulam suffered a severe bout of encephalitis that resulted in an emergency trepanation.  While recovering in the hospital, he played many games of solitaire and was intrigued by the question of how to calculate the probability that a given deal could result in a winnable game.  The combinatorics required to answer this question proved staggeringly complex, so Ulam proposed the idea of generating many possible solitaire deals and merely counting how many of them resulted in victory.  This proved to be much simpler than an explicit calculation, and the rest is history: Monte Carlo is used today in a wide variety of applications.



Additional References:
The photo at top of a Scrabble™ board was taken during the 2012 National Scrabble™ Championship.  Check out the 9-letter double-blank BINOCULAR.

For anyone interested in learning more about the fascinating world of competitive Scrabble™, check out Word Freak, also by Stefan Fatsis.  This book has become more or less the definitive documentation upon this subculture.  If you don't have enough time to read, check out Word Wars, a documentary that follows many of the same people as Fatsis's book. (It still may be available streaming on Netflix if you hurry.)



18 comments:

  1. A question from a statistics layman: You discuss how you can generate a particular (finite) set of legal words from a given rack of seven tiles. But in the game, of course, you are also using the tiles already on the board, as well as the ones in your rack, to form words. I'm wondering how you account for those "wild cards" (or wild tiles) in your analysis. Is that why you use 9-tile racks, rather than 7-tile racks, in your simulation? If so, how does that replicate the actual effects of the use of the tiles on the board?

    Just curious. Thanks.

    ReplyDelete
    Replies
    1. Good question!

      In the context of the game, the tiles on the board are heavily situation dependent. The board position depends directly upon the complete history of the game (what tiles were drawn together, and when) as well as the strategic choices of the players playing the game (ie, if they make the board more open or keep it closed). Since I'm trying to keep strategic considerations separate from the analysis that I've done (and since it would make any calculation very, very complicated) I've decided to ignore this and say that the player has 7 tiles on the rack as well as up to two tiles on the board that he/she can play through to make a word. The only constraint on the two tiles on the board is that they are necessarily not any of the tiles on the rack.

      Of course, in a real game the tiles on the board will also be heavily dependent upon game history and the players' strategic decisions. I'm choosing to ignore this and say that as a first approximation all we have to do is pick out two additional tiles from the bag and allow the player to also use them to form words.

      Delete
    2. I'd also ask: Wouldn't it be more realistic to run the simulation so that the two tiles on the board are treated separately from the seven in the rack? The two tiles on the board aren't a random draw. If you're positing that a player can add to those tiles to make a 9-letter word, then you would be better off with a constraint that the two on the board make up a playable 2-letter word.

      Yes, you could say that two tiles on the board could be split, and a player could still build a 9-letter bingo off of them, but that's such a rare occurrence that I think your simulation would be closer to reality if you eliminated it. Not that it would change your end results that much, and it surely would be harder to program, but that would be my thought.

      Actually, ANY 9-letter bingo is pretty rare, so it might even be reasonable to just recalculate using only eights and smaller. Which would be easier, rather than harder, to figure.

      Delete
    3. The 2 letters open on the board are not a random draw in the context of a real game. But explicitly accounting for the non-randomness of those letters would be impossible without bringing strategic considerations into these calculations. As a first approximation, I am saying that those two extra tiles can be anything that the player does not already have on the rack.

      As for the rarity of 9-letter words, it is also very, very rare to pull 9 tiles out of the bag and use them to spell a 9 letter word. While the in-game rarity and the simulation rarity are slightly different, I claim that they are closely related to one another so that the rarity of 9-letter words in the simulation approximates the in-game rarity of 9-letter words.

      In any case, based on preliminary simulations with 8-letter racks I am fairly confident that the results do not change.

      Delete
  2. Thanks for the sharing. Excellent your Tiles related goods Home Improvements vast tiles products here tfo in australa.

    ReplyDelete
  3. If anything should be changed from this analysis, shouldn't the number of tiles be looked at rather? S and R seem to be underrepresented compared to their usefullness?

    ps. I have no proper scrabble experience

    ReplyDelete
  4. I grew up around Scrabble – my mom would play it when she got around friends and family, and they let me play every so often, but CAT and THE don’t make for very good scores or good playing fields, so I mostly just spectated. During my college years, I spent five summers traveling on various groups and before I left one summer I bought a Travel Scrabble to take with me.

    ReplyDelete
  5. The most basic way to make your own website is truly to find on the net with an online site development application and apply one of those. You should use an online site development utility for you to easily put content like links, images, videos and text to all your webpages. The interfaces of these tools commonly cause whatever your content looks prefer to you in order to look the same to all your website visitors. This interface design is termed WYSIWYG, or What You Notice Is What You Obtain.

    ReplyDelete
  6. The author of this second article also claimed to have quantified the extent to which a letter could “fit well” with the other tiles given to a player. Unfortunately, some of the steps in the analysis of this second article were only vaguely explained, so it isn’t clear how one could replicate the article’s conclusions.travertine pavers

    ReplyDelete
  7. Kids play scrabble to improve their vocabulary, build their skills by forming words and to learn new words. http://bolsboardgames.com

    ReplyDelete
  8. Italian Design Tiles Hottest Styles A person that really wants to decorate a building or room with tiles has numerous options. Using Italian tiles is the one which rises up. Each latest Italian design tiles is a lot more attractive Italian design ideas

    ReplyDelete
  9. The author of this second article also claimed to have quantified the extent to which a letter could “fit well” with the other tiles given to a player. Unfortunately, some of the steps in the analysis of this second article were only vaguely explained, so it isn’t clear how one could replicate the article’s conclusions. Italian design ideas

    ReplyDelete
  10. Bathrooms

    Removing Shower Wall Tiles The fascinating thing about interior decorating is that it permits you to flex creative muscles whenever you get a new idea.

    ReplyDelete
  11. Ttalian Tiles SuppliersThe latest Italian tiles are nothing short of Inspirational, the way in which they mimic natural stone and wood is incredible. When laid, it is difficult to see the difference between the real material against the porcelain tile. You get the real look and feel without the maintenance negatives.

    ReplyDelete
  12. thank you for sharing

    http://home-improvementips.blogspot.com/2013/10/bathroom-tiles-fittings.html

    ReplyDelete
  13. Your blog has been very informative and the way you have presented is very impressive. The efforts made by you are noteworthy. Thanks a lot for this post, please keep posting as it is useful for all the readers like me.

    ReplyDelete
  14. Thanks a lot for sharing this with all folks you really recognise what you are talking about! In this complex environment business need to present there company data in meaningful way.Sqiar (http://www.sqiar.com/solutions/technology/tableau/) which is in UK,provide services like Tableau and Data Warehousing etc .In these services sqiar experts convert company data into meaningful way.

    ReplyDelete