Sunday, November 16, 2014

The Randomness of Course Ratings

Course and Slope Ratings are determined through a process that measures the yardage and obstacle values of a course.   The yardage of courses can be measured with some precision.  There are some judgments (e.g., roll, elevation) in measuring effective yardage, but any errors will be relatively small.  It is in the measurement of obstacle values where the greatest chance of random errors occurs. 

Errors in measuring obstacle values arise from the lack of precision in defining obstacles, confusing standards for rating each obstacle, model errors, and differences in subjective judgment among Rating Committees:

Lack of Precision in Defining Obstacles - The size, firmness and shape of a green in relation to the length of the approach shot is one obstacle.  The three characteristics -- size, firmness, and shape—of the obstacle are not defined with any specificity.[1]  If it is not clear what is to be measured, a lack of precision in the estimate is ensured.
Confusing Standards The Rating Committee must assign a value between 0 and 10 to each obstacle. The ratings criteria are both confusing and lack specificity.   Obstacle values are increased, for example, if a green is in poor condition or a player’s stance is moderately awkward.  Much of the confusion arises because the obstacles are not independent.  “Trees,” for example, present their own obstacle but also have an impact on the “fairway” and “green target” obstacle ratings.  It is not clear in the Course Rating Model how the independent effect of “trees” should be evaluated. [2]
Model Errors - The Psychological Obstacle Value is determined by the value of the other nine obstacle values.   This covariance among variables (Obstacle Values) leads to errors in the estimate the Scratch and Bogey Obstacle values. 
Differences Among Rating Committees - It is likely some Rating Committee members will weigh obstacle differently.  Given the subjective nature of the ratings process, this is a foregone conclusion.

To examine the “randomness” hypothesis the Course and Bogey Ratings of a course have been studied over the past rating cycle.  The course has had no change in its yardage, and there has been no significant change in the rated obstacles between ratings.[3] The changes in the men’s ratings are presented in Table 1.

Table 1
Change in Men’s Ratings

Tees
CR Old
CR New
Difference
BR Old
BR New
Difference
Gold
65.0
65.1
+0.1
86.4
85.7
-.6
Silver
68.4
68.2
-.0.2
90.9
91.1
.2
Green
70.7
71.0
+0.3
95.4
95.9
.5
Black
73.8
73.7
-0.1
99.6
100.1
.5

The differences in Course Ratings can be described as random.  That is, the course did not get uniformly tougher or easier for the scratch player or the bogey player.  Instead, the course was judged to be more difficult from two sets of tees and easier from two sets of tees for the scratch player.  For the bogey player, the first set of tees (gold) is rated easier while the remaining tees are rated more difficult.
A similar random pattern is shown in the change of ratings for women as shown in Table 2.  The new Course Ratings are higher from the gold and silver tees, but lower from the green tees.  For the bogey player, the course is now rated more difficult from the gold and green tees, but easier from the silver tees.     

Table 2
Change in Women’s Ratings

Tees
CR Old
CR New
Difference
BR Old
BR New
Difference
Gold
70.1
70.4
+0.3
100.6
101.1
+0.5
Silver
73.3
73.4
+0.1
107.3
107.1
-0.1
Green
77.6
77.1
-0.5
112.5
113.2
+0.7

The random variation probably stems from the subjective nature of the ratings procedure.  For example, assume the new obstacle ratings for topography are higher than the old obstacle ratings by one point on each hole.  Further assume the old and new ratings are identical for the nine other obstacles.  The increase in the obstacle value for the scratch player and in the course rating would be 0.2 strokes (1 x 0.1 x 18 x 0.11 = .198 rounded to .2).[4]  A similar error would lead to an increase of the bogey rating of .6 strokes (1 x .12 x 18 x .26 = .56 rounded to .6).  The Slope Rating would increase by 2 points ((.6 - .2) x 5.381 = 2.15 rounded to 2.0).  In essence, it only takes a small difference in the subjective ratings, rather than real changes in the obstacles, to lead to the small Course Rating changes shown in Tables 1 and 2.

The USGA could argue the systematic error in the measurement of topography described above is unlikely.  The error is more likely to be random with one hole being rated too high and another being too low.  The net result would be a much smaller change in the obstacle value.  There are two problems with this defense.  First, reliance on random errors discredits the measurement process—i.e.,”errors will cancel out” is not a rigorous defense for the Course Rating System.  Second, random errors do not always cancel out.  The 18-hole total of the weighted obstacle values only has to differ by 2.0 to get a 0.2 stroke change in the Course Rating.  A difference of 2.0 is not unlikely given the variance in the rating of individual obstacles.  For example, if the rating was 3 for each obstacle, the weighted obstacle value of the course would be 54.  A difference of 2.0 would only be an error of approximately 4 percent.   Such a small difference should be within the 95 percent confidence interval of the estimate of the total weighted obstacle value.  Therefore, small changes in the ratings are more likely due to the “randomness” of the rating process than to any physical changes in the course.      

The importance of random errors in the measurement process raises two questions for the USGA and local golf associations to consider.  First, if the re-rating results in small and apparently random differences from the old ratings, should the ratings be changed?   Unless the Rating Committee can point to physical changes that caused the differences, the prudent course would be leave the ratings unchanged. [5] After all, there are some costs (new scorecards, player confusion) in making changes to the ratings.   Second, are the required periodic re-ratings the best use of a Rating Committee’s time?   It would be more efficient and effective to re-rate a course if 1) its ratings seem out of line (e.g., visitors score higher or lower than expected or team performance is exceptionally good or bad) or 2) the club professional believes there have been significant alterations in the course since the last rating.  To rate for ratings sake better serves the bureaucratic interest of golf associations, but is not the most effective method for ensuring the equity of the handicap system. 



[1] USGA Course Rating System: 2012-2015, United States Golf Association, Far Hills, NJ, 2012.
[2] Op. cit., p. 27.
[3] A tree was removed from one fairway.  The tree was not an obstacle for the scratch player, and only affected the bogey player when he played from the green or black tees.  
[4] USGA Course Rating System: 2012-2015, p. 72.  The weight for the scratch topography obstacle is 0.1.  The sum of the weighted obstacle values is multiplied by .11 in the formula for the course obstacle value.
[5] Golf associations rarely explain why ratings have changed.  This is due in part to a lack of understanding of the USGA’s Course Rating Model.  Numbers from the field are entered into the model, and then the model produces Course and Bogey Ratings. This makes it difficult for the association to make a defense of the ratings based on physical changes in the course.  Instead, associations will defend the “process,” but not identify physical changes that led to the new ratings.