Medals and math — batting averages at the beer awards

Mural detail (Garage Project, 10 October 2017)
Friendly competition? (Detail from the mural at Garage Project, the new Champion Brewery)

The latest round of the Brewers Guild of New Zealand Awards1 were announced this weekend and this year they’ve given us more data than usual to play with. For the first time, the Guild has released information on what was entered, as opposed to just telling us who won, and I couldn’t be happier. I’m the kind of nerd who watches the Olympics and wants a per-capita column on the medal tally. Raw results are one thing, but I’m curious well you did relative to how hard you tried. And now, after an hour or so of strangely-enjoyable data entry and spreadsheeting,2 I know.

Brace yourselves for a deep dive into the numbers. With damn near a thousand beers from a hundred contestants, there’s obviously a lot going on, but I think the patterns are easy enough to get a handle on. I came up with two slightly-different ways of working out what a brewery’s ‘batting average’3 at the awards might be:

  • Medal Percentage (MPC) is the proportion of all beers entered that earned any kind of medal at all. The overall medalling rate for the competition was 52% (up slightly on the previous year) and it turned out fairly well distributed; breweries in the middle of the rankings scored around this level. It’s worth noting that a beer that earns nothing is either significantly faulted or disqualifyingly “out of style”4 or both — though we don’t know which without access to the judges’ notes.
  • Points Per Entry (PPE), on the other hand, assigns gold medals 3 points, 2 for silver, and 1 for bronze and divides the result by how many beers the brewery entered all up. This scoring reflects the fact that the medals aren’t unique, like at the Olympics; a category might have a half-dozen golds, for example ― and it’s the scoring the Guild itself uses to calculate the overall Champion Breweries. This year, a PPE over 1.0 put a brewery in the top third of the field as a whole.

So if you entered 4 beers, and won 1 gold and 2 bronzes, you’ve got an MPC of 75% and a PPE of 1.25. For all the breweries who entered 10 or more beers ― which is a bit over a third of all who contested the awards this year, and it chops out the noisier end of the data from smaller players ― this is what that looks like, in a handily-sortable table:5

Performance at the 2017 BGONZAs, for breweries with 10 or more beers entered

n = number of entries, MPC = medal percentage, PPE = points per entry, G/S/B = individual medals

Garage Project4261.91.024913
Boston Beer39180.36313
Lion (NZ)3073.31.24612
DB (combined)2860.70.891610
Black Dog (NZ)27370.5246
8 Wired2673.11.19289
Boundary Road25440.48110
Fork Brewing25480.7266
Sprig & Fern20350.4516
Lion (Australia)1643.80.5625
Three Boys15400.633
Deep Creek1464.31.14234
North End1485.71.29147
Good George14641.07134
Te Aro1330.80.4622
Cassels and Sons12250.3312
Aoteroa Breweries (Mata)10400.613
Hawkes Bay Brewing10300.33
Boneface Brewing10500.9122

Doing well by doing badly: Sam Adams

Sam Adams 'Boston Lager' (Malthouse, 10 November 2009)
First place in my original Diary, but you don’t get a medal for that (in fact, in 2017, you don’t get a medal at all)

The most striking upshot of this perspective is that the Champion International Brewery ― The Boston Beer Company, more usually known as Sam Adams ― did really, really badly even as they picked up one of the headline awards of the night. At the BGONZAs, “Champion” is determined by adding the scores (in medal terms) of your best four beers, with a countback to further beers in the event of a tie. Boston Beer got three golds, handily beating any other international entrants, but they entered thirty-nine beers to achieve that. Their MPC was 18% ― dead last among breweries with 10 or more beers in play, and second to last among breweries (of any size) who scored more than zero. On PPE terms, they’re also in the tail end of the rankings, with 80% of all entrants outperforming them. It’s hardly a Champion-level performance, on these numbers.

And yet, once again, they take home the trophy. It borders on farce, because they’ve done the same for 7 out of the 9 years it’s been awarded (losing only to Castlemaine in 2015 and Deschutes in 2009). Repeatedly spamming a competition on the other side of the world, in a market to which you don’t regularly export, must just be about being able to call yourself the “world’s most awarded” or possibly of benefit for the feedback you get from judges tasting your beer blind and after a long shipment.6 The awkward fact is that, other than Boston Beer and Lion Australia (with less than half Sam Adams’ number), very few international entries were received. The Guild should just consider giving up on this side of the competition, or ask Boston Beer to sit it out ― but, if this year is representative (and I have no reason to suspect it isn’t), a reliable $7,000 in entry fees would be hard for them to turn down.

Doing well and doing a lot: Garage Project

Awards storage (Garage Project, 10 October 2017)
They’re going to need a bigger bin

Compare that to Garage Project,7 the overall Champion and the only brewery to field more beers than Sam Adams. Putting in forty-two entries and maintaining a Medal Percentage of 62% and a Points Per Entry of 1.02 sure as hell isn’t spamming, whatever it is. But if you sort the above table by either of those measures, the first thing you’ll probably notice is that Garage Project drops out of the top ten ― indeed, a dozen breweries8 outperformed them on both metrics at once. But that’s not how the competition works. Here, there were four or five breweries on four golds9 so we look to the silver medals for the countback and Garage Project got more of those than anyone else. On the rules, and speaking generally, Garage Project absolutely deserved to get the gong. Their interview yesterday on Beertown notes that their beers aren’t always ‘at home’ in competitions, but this year they went all out, and it worked — and they strike a nice balance in acknowledging the weirdness of awards while also being chuffed to take out the title.

Their MPC and PPE are too high for any allegation that they just brute-forced the awards to have much credence. If it feels strange they could be beaten so roundly on measures like those — which hopefully do capture a sense of ‘success’ — then your issue is with the competition criteria, not with Garage Project. I’ve been pointing out for years that the system allows for weird situations like Boston Beer’s result this year, or worse. A brewery could be crowned “Champion” after entering a hundred beers, even if 95 were so bad they were disqualified and one was poisonous to the extent that it killed a judge ― so long as their other four beers all got gold medals and the next-best entrant scored three golds and a silver. Nevermind possible toxic beer, you don’t need to brew beer at all to take the Brewers’ Guild’s top honours: four gold-medal-winning ciders could do it, provided one of them is “flavoured”.10 There’s no argument between “Garage Project deserved to win” and “these criteria are a bit whack” ― both are true. If the Guild want to borrow my devious brain to work out new rules to exclude strange edge cases, my rates are reasonable.

Unsung winners and losers

North End 'Super Alpha' (My house, 6 September 2015)
One of the 86%; a silver medal beer on a gold medal day

The awards night, and a lot of the subsequent coverage, is centered on the trophies ― unique and easy-to-explain “best in class” titles for each category that unfortunately obscure some impressive results that might’ve narrowly missed such banner recognition. North End, for example, had the best Medal Percentage (86%) by a considerable margin among breweries who entered 10 or more beers. Altitude and Epic had marginally-better MPC off seven entries each, and a smattering of companies scored 100% MPC but with only one, two or three beers in play. Most notable among those was Bootleg, about whom I’ve heard basically nothing,11 but who got the trophy in the hotly-contested Strong Pale Ale category with their one solitary entry. Lion nearly won the top title again, and Sawmill and McLeod’s were very close being being first-time Champions;12 if any of their silvers tipped over into gold territory, they’d have taken it. Sawmill, while they were at it, also convincingly earned the best Points Per Entry (1.7) among breweries fielding 10 or more entries. Epic, again, did fractionally better with their seven beers and another handful managed PPEs of 2 or even 3, but only from a similar number of beers.

At the other end, the Guild’s decision to release more data also reveals just how badly some companies did. Previously, we couldn’t infer much from silence: now we know, for example, that Yeastie Boys earned zero medals because they entered zero beers ― but the same is definitively not true of everyone else with empty luggage after this year’s awards. A dozen entrants won nothing for their efforts, though most of them were fairly small operators. Hot Water Brewing in the Coromandel, and Funk Estate and Mount Brewing (who now share facilities) are notable exceptions; the latter pair put in 13 beers, combined, and that they won no medals must be concerning.13 And several brewers earned some medals, but at strikingly low percentages: Black Dog (a subsidiary of DB / Heineken) just couldn’t be happy with 37% MPC and 0.5 PPE off 27 entries, for example. Cassels & Sons, Hawkes Bay, Te Aro and the Sprig & Fern all did even worse. Wanaka Beerworks and Kaiser Brothers entered fewer beers than any of those, but scored lower still; only 12.5% MPC for the latter.

The data that comes out of something like the beer awards is inherently limited and hard to contextualise, but it’s nice to have some math in the mix when merit is being debated. For example, Medal Percentage and Points Per Entry also give us a nice way to compare our local “big” breweries ― and the results are precisely as I would’ve predicted last week: Lion (Kirin) did considerably better than DB (Heineken) who in turn clearly outdid Boundary Road (Asahi). There are obviously more stories lurking in all these numbers, but these were the most striking to me. I’ll continue mucking around with the table and see what other patterns bubble to the surface. You’re welcome to do the same. Let me know what you find. And keep this stuff in mind when breweries are crowing about their results ― or keeping conspicuously quiet.

  1. The Bagonzas (#BGONZAs), as they are increasingly and affectionately known, thanks to Stu McKinlay for (probably) coining, and to Jono Galuszka for popularising through his tireless live-Tweeting of the night.
  2. Is that the verb? In any case, my working is here (I did the pivoting — is that a verb? — offline in another program because Google Docs was being confusing) and the official catalogue of results is here. I didn’t tote up everone, but have about 95% — basically I skipped the cider-only producers and might’ve missed one or two very small operators as well.
  3. Broadly analogous ― if you want to get all Moneyball about it ― to the difference between the on-base average and slugging percentage in baseball, perhaps ― and perhaps a bit like the difference between a cricketer’s raw average and their strike rate. You see what I mean; there’s no one stable, statistical notion of “best”.
  4. Which is, of course, the perennial complaint about beer awards from many corners (including from me, here). Judging beers against some Platonic Ideal of what, say, a New Zealand pilsner should be is funamentally weird and not what the public necessarily has in mind when they think of “best pilsner”.
  5. Thanks to the just-like-magic TablePress plugin for WordPress.
  6. Either way, having handled awards entries for Garage Project for a few years, I pity the poor fool(s) who must have this as damn near their fulltime job in Boston, if the brewery spams other competitions in the same way.
  7. Perennial disclaimer: Garage Project are my former employers, a lot of their staff are friends of mine, and I drink their beer more than anyone else’s. Make of that what you will.
  8. A shortcut for the curious: 8 Wired, Behemoth, Brave, Deep Creek, Emerson’s, Good George, Liberty, Lion (NZ), McLeod’s, North End, Sawmill and Tuatara.
  9. I wasn’t sure how to credit a Sawmill / Good George collaboration, so I split it between them for my calculations.
  10. Among the few restrictions on what counts for the “Champion” title ― set out in the Guild’s Entry & Style Guide ― entrants are required to contest two trophy categories (so you couldn’t win on the strength of four Strong Pale Ales, for example), but there are two cider trophies. There’s an argument to be had that cider medals should be excluded for determining the Champion, just as packaging awards are. Two years ago, Lion narrowly beat Liberty thanks to a cider medal, if I recall correctly.
  11. They only warrant a sentence or two in each edition of local guidebook Brewed ― the second edition of which is out now and for which I feel I owe you all a review. The short version: probably avoid. The first edition was problematic enough, and the second manages to make things worse…
  12. I originally included Deep Creek in this list, but evidently counted their beers twice while patching together two separate data-entry shifts. Massive thanks to Hamish from the brewery for spotting that and setting me straight in a comment, here.
  13. Forever keeping in mind, though, the above note that there are two ways to fail, here: a beer could just be “out of style” or it could be seriously faulted. We don’t know which, but 0/13 is a worry.

6 thoughts on “Medals and math — batting averages at the beer awards”

  1. Hi Phil

    Hamish from Deep Creek here. You have made a mistake with our entry and medal tally, it appears to have doubled (14 entries 2G, 3S, 4B). So we weren’t really in the running for champion brewer but your other metrics hold true, our PPE and medal % slightly improve.

    We also managed to nearly replicate our medal tally for our contract customers which is something I’m very proud of.

    I also love the guild’s decision to release entries vs medals, AIBA has been better in this regard over the years.

    We also spreadsheeted the results for analysis so it’s great to see someone else delving into the numbers and publishing analysis. I agree that Garage Project deserved their gong, I’m envious that they could enter 42 beers! What leapt out at me was how well Auckland breweries did this year. I’m really happy for Sawmill and McLeod’s, I feel that their beers have been excellent for a while now. Hats off to the medal factory at Steam as well, they continue to be the benchmark for contract breweries.


    PS Lion pipped Panhead for champion brewer with a cider gold. I’m sure Mike is over it by now…

    1. Well, that’s embarrassing.

      Massive thanks for spotting that and alerting me. I went back to my raw data and it looks like I patched together two separate stints of data-entry and accidentally doubled yours up! (I’ve had a quick look over the rest of the spreadsheet and it looks like this mistake didn’t happen to anyone else’s beers.)

      Those are still very creditable MPC and PPE scores, but yes, you weren’t in the trophy run-off. Nice work with your contract customers, too. And you’re absolutely right about Steam; their results are stellar. If I had more information on who brews what for who, it’d be great to compare contract producers… Next time, maybe.

  2. Nerd!

    There’s a few assumptions in this type of exercise that tend towards circular logic.

    First, and you’ve raised this, it’s impossible to distinguish between poor category choice and genuinely bad beer. This competition skews towards poor category choice because it encourages brewers to enter beers as labelled, rather than in the optimum category. (And that’s another story…). If you win gold, but the judges don’t accept the labelled style matches the entered style (and that can be a bit of a subjective call, according to the judges I’ve spoken to), then you cannot go forward for the trophy. So some brewers will be slipping into labeled category and crossing fingers, which could well give a ‘good beer, wrong category’ result.

    Two – Your measure gives an indication of the brewers’ ability to select their best beers. Some brewers spammed, some dutifully entered their core range, some carefully cherry picked their range. A lot of seasonals etc won’t be entered, either because they’re all gone or past their best. So a good rating indicates a skill in curating a Best Of list, but it doesn’t measure the likelihood that every beer you find from that brewery will fit the range you’ve measured.

    And three, are you testing brewers against an objective assessment, or measuring the effectiveness of blind judging? Put it this way – you could equally design an experiment where a panel of judges blind taste their way through a range of breweries from the budget to premium, notorious to adored. If you compared their assessment to a ranking of expected quality, you would have a measure of the effectiveness of those judges. Either you assume the judges are perfect and the variations are down to the beer, or you assume you know the beers already and the variations are down to the judges. In the real world you are doing both at the same time, and even though you have nice clean data, there’s a big element of subjectivity involved in any blind tasting.

    So yes, good work on the number crunching (I was hoping someone else would), but there’s some subjectivity and noise in the results.

    Cheers mate!

    1. Oh, absolutely there’s a bunch of subjectivity in there. That’s my complaint about beer awards and why I wouldn’t bother entering them, if I actually had a brewery.

      To me, they’re basically worthless until we do some science on the judging process itself: nevermind worrying about a beer being in the wrong category, we should be putting beers into the process at multiple times during the day and testing whether the judges actually rate it consistently, first. I’ve seen a few write-ups of that in wine competitions and the results have not been promising. Add to that the fact that “blind” judging in a small country is almost impossible for anything flavoured / experimental (oh, it’s an American IPA, but with Vietnamese mint, lime, mango and chili — whatever could it be..?) and the grain of salt you need to take the awards with would scarcely fit into that airplane hangar in Christchurch.

      And I agree completely that an underappreciated part of the process here is “know thyself”; only put forward beers that actually are on form. I was going to mention Renaissance, putting forward a single beer from their wide range and getting a gold. That’s good self-awareness.

      But it’s still the case that everyone’s in the same race, even as it’s a flawed competition. So measures like MPC and PPE still give a sense of success relative to effort, which is what they’re all about. I’m definitely not saying “best MPC / PPE = best brewery” — I don’t believe in “best” at all!

  3. All I can say is that you have great taste in beer Phil, the Rocky Knob “Code” American Brown won gold. Sorry for the Name……… Also just to back up Hamish, the 3 Deep Creek contract brews we entered all got medals!!

  4. Nice spotting on us not entering beers. We didn’t enter last year, either, though did very well from the UK Awards we entered in 2016 (a trophy, two gold, two silver and one bronze from 6 beers entered – plus one gold for design from one entry. I believe design doesn’t count for champion brewery in NZ and I’m hoping ciders don’t either).

    In general I find it a hard to justify the entry fee and, frankly, I’m shocked at just how many beers that many of these breweries are entering. We feel we’ve rarely got any kind of return from medals or trophies, other than conformation that judges are not picking up anything in the beers that we didn’t note ourselves. Perhaps we don’t use them as well as we could.

    Congratulations to all the winners. It’s amazing how the scene has changed in the two and a half years I’ve been gone!

Have at it: