UZR, Scouting, and the Fans

After reading some discussions over at The Book blog about UZR and regression to scouting reports I thought it would be a good idea to use the fans scouting reports as a regressing factor for UZR.

My methodology was as follows:  I binned players into groups based on their positional ranking within the scouting reports, and then calculating the weighted average of the UZR/150s of the players within the bins.  The following table is the results using the data from 2007, 2008, and 2009.  (Quick Edit, the below table is for SS only, sorry for any confusion)

1-10 5.6
11-20 2.5
21-30 -1.5
31-40 -3.3
41-50 -3.9
51+ -9.1

At this point my methodology diverges, as I wasn’t sure which method I like better.  Method 1 is to regress each individual season’s data based on the players rank that season to get a new seasonal UZR, and then weight across the 3 years of data.  Method 2  is to weight across the three years of data and then regress using the most recent fans scouting report ranking (in this case the interim 2009 results).

Method 1 is clearly sensitive to the ebb and flow of the fans, and is also a little more dependent on those rankings since the UZR’s being regressed have a smaller number of defensive games associated with them.  Method 2 does not create “single season” stats as some people would probably like, and it only uses the most recent fan’s ranking.  Overall I think I prefer Method 2, but could be swayed either way.  The following table lists the top 10 shortstops ranked by Method 2 (I really need a better name).

Rank Name 3 year uzr Method 1 Method 2
1 Omar Vizquel 18.4 10.1 11.8
2 Jack Wilson 11.1 8.0 9.3
3 Brendan Ryan 11.4 7.0 8.4
4 Cesar Izturis 9.0 6.1 7.7
5 J.J. Hardy 9.2 5.8 7.2
6 Elvis Andrus 8.3 7.1 6.8
7 Adam Everett 13.2 4.9 6.6
8 Erick Aybar 7.3 6.3 6.5
9 Jimmy Rollins 6.6 5.9 6.3
10 Paul Janish 11.9 6.9 5.8

and the bottom 10

Rank Name 3 year uzr Method 1 Method 2
43 Hanley Ramirez -4.9 -3.3 -3.9
44 Stephen Drew -5.2 -2.6 -4.1
45 Ramon Vazquez -7.8 -4.5 -4.3
46 Alex Cora -5.3 -4.3 -4.4
47 Luis Rodriguez -7.9 -4.8 -5.0
48 Juan Castro -16.6 -2.6 -5.2
49 Julio Lugo -9.3 -5.3 -6.8
50 Khalil Greene -10.4 -1.5 -8.0
51 Brendan Harris -8.3 -7.9 -8.7
52 Yuniesky Betancourt -12.3 -9.6 -11.4

A couple of quick caveats, if you read the comments on the above linked thread, I noted that defensive games at fangraphs looks a little messed up. Those going back to normal would likely change these results.  Also, I didn’t do a great job of searching the blogosphere, so if this has been done before, I apologize for presenting it as a new methodology.

As far as data sources:  UZR via fangraphs and the fan’s scouting report via tangotiger.  As always, comments or suggestions are appreciated.

  1. #1 by Michael on October 29, 2009 - 3:39 pm


    I’ve sort of done this before myself, using a different method. I took the scouting turned into runs for each year, weighted at a certain amount for each season, weighted each season between defensive metrics and scouting runs total, then regressed to 0 mean. After reading MGL today, I might try regressing the weighted three years, still including each season’s scouting reports, to the current season’s reports.

    I might give this a shot for my team before trying anything greater than that, but that seems fair. Truthfully, I never had any issues with the methods I used, but with MGL saying the regression needs to go towards report instead of the mean, I’d like to see those numbers and if they pass the sniff test.

    • #2 by stevesommer05 on October 29, 2009 - 3:52 pm

      Yeah I’ll probably march down the path of doing the other positions and see what shakes out.

  2. #3 by Sean Smith on November 3, 2009 - 1:26 pm

    How about Derek Jeter? Must be in the middle of the pack, but if you want exposure, never present top/worst SS without showing where the captain ranks! :-)

    • #4 by stevesommer05 on November 3, 2009 - 2:04 pm

      Lol. He was 32/52 using either method.

