Debunking the Evil Shuffler | Who Tracks the Trackers?

Editorial Disclaimer

This post was originally published by Spencatro on 08/06/2018. Spencatro has since gone on to work at Wizards of the Coast (Dec 2018). MTGATracker remains a 3rd-party project that is not affiliated with Wizards of the Coast, and is published pursuant to the Wizards of the Coast Fan-Content Policy. The views and opinions expressed in this post are strictly those of the author, and do not reflect the official position, policy, views, or opinions of Wizards of the Coast. No authors were compensated by any parties for the authorship of this post.

If you’ve ever been to any MTGA community–like the MagicArena subreddit, the MTG Arena Players Facebook group, or even RiptideProLab’s Discord–you’ve probably heard many a myth of the evil MTGA shuffler. As the horror stories go, the totally busted shuffling implementation in MTGA seemingly flips a coin, then stacks the next 10+ draws with either all lands on heads, or none at all on tails. Sometimes the story is accompanied with a screenshot, sometimes with a manually recorded spreadsheet, sometimes just with a rant.

Mana screw isn’t unique to the MTGA format, of course- though complaints about the shuffler do sometimes seem more frequent within the MTGA community. Is it all too easy to blame a machine for a bad game, or is there perhaps an actual flaw? Fortunately, the digital nature of MTGA does offer the unique chance to make some conclusions about this question by using machine-recorded information, rather than conjuring up potentially flawed speculations based on anecdotal evidence. Furthermore, MTGATracker data comes from a huge variety of users, recorded by an unfeeling machine that has no incentive to fudge a number here or there (except maybe to fuel the imminent AI revolution by causing petty squabbles over–you know what? I’ll digress there).

Today we aim to use MTGATracker data to seek out the truth about the so-called “rigged” MTGA shuffler. Our goal is to draw a histogram (or bell curve) that represents mana screw on one end, mana flood on the other, and healthy draws in the center.

The Method

We looked at all 26,208 MTGATracker game records collected between the 3.5.5 release (in which MTGATracker started tracking “cards drawn”) and the initial writing of this piece (7/26/18 at approximately 11AM, PST).

Since MTGATracker can subtract the set of cards left in the deck from the set of cards in the original decklist, it can take note of which cards a player actually sees (or which cards are drawn from the deck in any fashion) over the course of a game. With this data, we can do some very simple math to find out how often the amount of lands drawn is mismatched with the number of lands in a deck, and by how much. This data will end up fitting very nicely into a bell curve!

To find these numbers, for each game we will:

Count the number of lands used in the players’ deck
Find the percent of the deck comprised of lands (pct_lands_in_deck = game_lands_used / len(deck_cards))
Count the number of lands drawn by the player
Find the percent of the cards drawn that are comprised of lands (pct_lands_drawn = game_lands_drawn / len(drawn_cards))
Subtract the percent of lands drawn from the percent of lands in the deck (pct_land_diff = pct_lands_in_deck - pct_lands_drawn)

For example, if a deck runs 40% lands (24 / 60), but in a game the player only draws 1 land out of 10 cards drawn, their percent land differential will be -30%. However, if a player only runs 6 lands in a 60 card deck, we would expect that they draw fewer lands; if the player only draws 1 land of 10 cards drawn, the land differential would end up being 0%, as it hit the target exactly. On the other end of the curve, if a player runs 40% lands, but draws 9 lands of 10 cards drawn, their differential will be +50%. In this manner, the area of the curve nearest 0% represents healthy draws, while the far left and right ends represent screw and flood, respectively.

Once we have a list of land differentials per game, we can sort them into “buckets” and then measure the size of each bucket. A healthy curve will have the majority of land differentials near zero, with fewer tallies towards the ends of the curve.

Enough words, let’s see the pictures!

The Results

Each bar in this graph represents the number of games where the land differential was between x and x + 5. For instance, the bar at 0% represents all games where the land differential was between 0% and +5%.

At first glance, this graph looks very healthy. The data creates a smooth curve, and the sections in the center of the graph make up the bulk of the data, which means that the majority of games played have a healthy land split. The tails are long, but this is probably a good thing; it means that WotC doesn’t seem to be taking extraordinary measures to completely disallow the bad scenarios that come with true variance.

But can we break these numbers?

Let’s try to determine what a healthy draw looks like. If you’re playing a deck with 40% lands, a game with an ideal land draw would be right at 40%- but we’ll say that a game is “healthy” if you draw 20% to 60% lands, a differential of -20% to +20%. Let’s consider everything outside of this range “unhealthy,” and redraw the histogram with the tails all smashed together.

This graph still looks quite good. While the ends have gotten a bit beefier, the curve still isn’t disrupted! We could actually even go 5% tighter without disrupting the curve shape: 1,146 + 751 is still less than 2,076 (the next bar), and 948 + 1,108 is just a hair less than 2,104. Hitting the tail ends of this curve will certainly always feel bad, but it definitely doesn’t happen the majority of the time, or even that significantly.

So exactly how often do players get flooded / screwed? Let’s arrange this data a little differently and find out:

In this graph, anything marked “Healthy” falls within the range of -10% to +10% (for a normal 24 land / 60 card deck, this would be drawing 3-5 lands in the first 10 cards). The region marked “Questionable” sits between the |10% - 20%| range (2-3, or 5-6 lands in 10 cards), and the “player is pissed” range is |>20%| difference from the expected lands drawn (0-2, or 6-10 lands in 10 cards).

The Verdict

While these unique cases definitely hurt when they happen–and they do happen!–they’re far from the norm. It would be interesting to compare this data to hand-shuffled decks (if we had more time–and maybe a little more dedication–we could count land draw distributions from a video-recorded GP or PT^†) to get a better idea of how the shuffler in MTGA compares to real life shuffles. In any case, this myth seems pretty busted.

^† Interested in doing this research? Come let us know in Discord, and be our next guest author!

Don’t get me wrong, mana screw / flood does hurt when you’re unlucky enough to find yourself on the far edges of the curve. But, comfortingly, if you’re currently in a super un-fun game without mana, or maybe with too much, our data shows that 69% of all tracked games fall within a healthy range- so your next game will probably be better.

Hopefully this post can help put the “evil shuffler” conspiracy to bed once and for all.

Editorial Disclaimer

The Method

The Results

The Verdict

Share on: