In this post, we discuss the question of how to benchmark the performance of a risk matrix, so that different risk matrix designs may be compared. In so doing, we will develop some tips for the more accurate design of risk matrices.
This post assumes that the reader has read our earlier blog posts and is familiar with the design methodology used in Quick Risk Matrix.
After starting this article, we found ourselves unavoidably discussing two methods of risk matrix design, in addition to the method used in Quick Risk Matrix. We will demonstrate that the former methods (i.e.. the non Quick Risk Matrix methods) are not defensible.
We shall create our examples using Quick Risk Matrix Premium, which includes a performance benchmarking tool.
There is very little in the literature on the subject of benchmarking risk matrix performance. As far as we are aware, the only papers of note are those by Cox (Ref. 1), Xing Hong (Ref. 2), and Li, Bao and Wu (Ref. 3).
Cox investigated risk matrices up to size 5 x 5 with three colors and equally subdivided axis scales ranging from 0 to 1. He concluded that risk matrices typically have poor resolution. He states "Typical risk matrices can correctly and unambiguously compare only a small fraction (e.g., less than 10%) of randomly selected pairs of hazards." That is a startling conclusion. However, his assumptions are pessimistic. For example, he assumes that when two risks fall in the same risk priority level (i.e. they are located in cells of the same color), there is only a 50% probability of ranking them in the correct order. His assumption is correct when both risk points are located in the same cell but not correct when the two points lie in different cells (unless the decisionmaker is using blind guesswork!). We shall recommend a method in this post for breaking ties when two risks have equal priority levels. In addition, Cox generates his figure of a 90% error rate by using a specific risk matrix and with the assumption that the plotted risks lie in the worst possible positions in the risk matrix.
Cox also proposes a design methodology for matrices based on axioms that he calls weak consistency, betweenness and color consistency. He illustrates the methodology for matrices with three colors ranging from size 3 x 3 to 5 x 5. He develops the surprising conclusion that there is only one possible coloring pattern for a 3 x 3 or 4 x 4 matrix and only two possible colorings for a 5 x 5 matrix. We won't go into detail here but we will point out that the limited number of colorings stems from axioms that are overly restrictive. The rational risk matrix designer will want to specify the risk values that define the thresholds between the different risk priority levels. There is nowhere in the Cox methodology for input of the risk thresholds. Instead, the coloring that flows from the axioms implies the thresholds, which is putting the cart before the horse!
Li et al. propose a design methodology that relaxes Cox's axioms. They call their methodology the Sequential Updating Approach (SUA). They impose a condition that for a cell A to have a higher risk priority level than a cell B, the probability that a random point in A has higher quantitative risk than a random point in B must be greater than a certain value, alpha (0.5 <= alpha <= 1). An assumption is required for the color of the bottom left cell. Then, by an iterative process, the colors of all the other cells may be determined. The authors show that for a given value of alpha, and given subdivisions for the axes, there is a unique risk matrix coloring. The approach maximizes the number of colors (risk priority levels) in the matrix for a given value of alpha. The higher the value of alpha, the fewer the number of colors that will appear in the matrix. The SUA methodology, like the Cox methodology, pays no regard to the risk threshold values that the risk matrix designer might want to use. The Sequential Updating Approach and the Cox methodology are both indefensible for this reason.
Li et al. use two measures to assess the performance of risk matrices, which they call Probability of Elimination Error (PEE) and Probability of Wrong Pairs (PWP). They calculate these two measures by generating random pairs, assuming that probability and consequence are uniformly distributed and independent. If the two points within a pair lie in different risk priority levels, and the ranking based on quantitative risk differs from the ranking based on risk priority level, they count that as both a PWP error and a PEE. If the two points lie in the same risk priority level, then the two points will be ranked equally and some procedure (applied by the end user of the risk matrix) must be assumed if the tie is to be broken. It is not clear what tie-breaking procedure was assumed. A PEE is counted when the tie-breaking procedure fails to give the correct risk ranking.
The authors describe PEE as a measure of resolution and PWP as a measure of accuracy. In worked examples, the authors found that as the value of alpha increased, the number of colors in the matrix decreased, the PEE increased and the PWP decreased.
Note. The term PEE appears to have been coined by Xing Hong (Ref. 2).
Benchmarking Measures in Quick Risk Matrix
The performance benchmarking tool in Quick Risk Matrix Premium uses three measures of performance, in contrast to two used by Li et al. and one by Cox.
The measures used in Quick Risk Matrix are:
- Accuracy of Mapping
- Probability of Elimination Error (PEE)
- Probability of Rank Reversal (PRR)
We will explain each of these measures below.
Accuracy of Mapping
As explained in more detail in other posts in this blog, Quick Risk Matrix treats a risk matrix as an approximation to a risk graph. The risk graph shows without error how probability and consequence map to risk priority level. The risk matrix will inevitably make mapping errors and one of the purposes of risk matrix design is to make these errors as few as possible, i.e. the risk matrix should be a good approximation to the risk graph.
The performance benchmarking tool in Quick Risk Matrix Premium assesses the mapping accuracy of a risk matrix by generating a large number of random risk points and counting how often the risk points are mapped correctly to risk priority level. The risk graph determines what constitutes correct mapping. In generating the random points, probability and consequence are assumed uniformly distributed when the matrix axes are linear and log-uniformly distributed when the matrix axes are logarithmic. The probability and consequence are taken to be correlated with a Spearman correlation coefficient input by the program user in the range -1 to +1. A Spearman coefficient of 0 corresponds to independence between the variables.
The results are presented as:
- Percentage correctly mapped
- Percentage mapped to a risk priority level that is too high
- Percentage mapped to a risk priority level that is too low
Accurate mapping is arguably the most important characteristic of a well-designed risk matrix.
Probability of Elimination Error (PEE)
The concept of PEE arises in the scenario that a decisionmaker may have two risks but can only afford to eliminate one of them. Which one should be eliminated? While factors unrelated to the risk matrix will go into such a decision, it would certainly be helpful to be able to rank risks on the basis of the risk matrix, even for risks located in the same risk priority level.
Our definition of PEE is the same as employed by Xing Hong and Li et al. However, it is important to state what tie-breaking procedure is assumed when the two points in a pair lie in the same risk priority level. So we will now discuss some possible tie-breaking methods.
Possible Tie-Breaking Methods
A possible tie-breaking method is the Borda count (developed by Borda in 1770). Borda count, as applied to risk matrices, ranks risks according to their row and column positions in the matrix. The method is quite complicated to implement. Also, every time a new risk is added to the risk register, the Borda counts need to be recalculated. It is unlikely that many risk matrix users will use Borda counts.
Xing assumed that users might break a tie by looking at the relative positions of the cells containing the two points. The risk to be eliminated is taken to be the one in the upper or right side cell. This does not appear to address all possible relative positions. For example, what if a cell is both upper and left?
Another possible tie-breaking method is guesswork, i.e. arbitrarily assume one of the two risks is the larger. This is what Cox assumed users would do when deriving his very pessimistic predictions of risk matrix performance. Since its accuracy will be 50%, this is a very poor method.
An easy and practical tie-breaking method is to compare cells of the same color on the basis of the risk value at the geometric center of each cell. Suppose a cell is bounded by x = X1, x = X2, y = Y1 and y = Y2. The geometric mean of X is the square root of X1*X2 and the geometric mean of Y is the square root of Y1*Y2. The risk at the geometric center may then be calculated by combining the two geometric means (i.e. multiplying them when risk is defined in the usual way as the product of probability and consequence). We call this the Geometric Center Method of tie-breaking. For decisionmakers who want to compare risks falling within the same risk priority level, this method is easy to apply (e.g. in a spreadsheet). Importantly, the method requires no information other than that already contained in a properly constructed risk matrix.
In calculating PEE, Quick Risk Matrix assumes that decisionmakers interested in comparing pairs of risks would employ the Geometric Center Method. The method only fails when both risks of a pair fall within the same cell, in which case Quick Risk Matrix assumes that the decisionmaker has only a 50% chance of ranking the pair correctly. There is simply no way to differentiate between two risks that fall in the same cell on the basis of the risk matrix alone.
Note that the geometric mean does not exist if X1 = 0 or Y1 = 0 but there are ad-hoc methods of overcoming this issue.
The significance of the geometric center can be explained as follows. Suppose the risk at the bottom left of a cell is R1, the risk at the geometric center is R2, and the risk at the top right of the cell is R3. Then R2/R1 = R3/R2. In this sense, the geometric center is the "mid" point of the cell.
To calculate PEE, Quick Risk Matrix generates a large number of random pairs of points (correlated if required via a user-input value for Spearman's correlation coefficient). It evaluates the ranking of each pair according to the risk matrix against the ranking given by the quantitative risk values of the two points. When both points of a pair lie in the same risk priority level, the calculation assumes that the decisionmaker is using the Geometric Center Method of breaking ties.
Probability of Rank Reversal (PRR)
What we term PRR is exactly the same measure as the Probability of Wrong Pairs used by Li et al. PRR is calculated only for pairs of points lying in different risk priority levels. PRR errors are, therefore, a subset of Probability of Elimination Errors.
To calculate PRR, we generate many random pairs of points and, for all pairs where the points fall in different risk priority levels, we evaluate using the quantitive risk values of the two points what percentage of pairs are incorrectly ranked.
The figures below were produced in Quick Risk Matrix Premium. The risk matrices shown are very simple for purpose of illustration and are not intended to represent practical designs.
We start by creating an example risk graph with six risk priority levels defined by five iso-risk contours.
Figure 1: Risk graph
We convert the risk graph to a risk matrix using the Predominant Color algorithm (one of several algorithms in Quick Risk Matrix). This colors each cell split by iso-risk contours according to the color that predominates in the cell.
Figure 2: Risk matrix
The above risk matrix has been contrived to be identical to one developed by Li et al. using their Sequential Updating Approach (SUA) with an alpha value of 0.8.
The performance benchmarks for the above matrix were calculated in Quick Risk Matrix based on 100,000 points generated assuming probability and consequence to be uniformly distributed and independent (as per Li et al.):
- Points mapped accurately to risk priority level 65%
- Points with overestimated risk priority level 15%
- Points with underestimated risk priority level 20%
- PEE 11%
- PRR 1.6%
The mapping accuracy is poor but the ability to rank pairs of risks as indicated by PEE and PRR is quite good. Cox's claim that typical risk matrices have an error rate in ranking pairs of risks in excess of 90% is not borne out.
Cox stated in his paper that "For risks with negatively correlated frequencies and severities, they [risk matrices] can be 'worse than useless,' leading to worse-than-random decisions." To test this claim, we will run our performance simulation again but this time assuming that probability and consequence are negatively correlated with Spearman correlation coefficient of -0.8. A small subset of the generated points is shown below overlaid on the risk matrix:
Figure 3: Risk matrix overlaid with a sample of negatively correlated points
The performance benchmarks for the matrix with the negatively correlated risks are
- Points mapped accurately to risk priority level 64%
- Points with overestimated risk priority level 16%
- Points with underestimated risk priority level 20%
- PEE 20.2%
- PRR 2.3%
Our results are almost the same as before for mapping accuracy but somewhat worse for PEE and PRR. However, Cox's claim that the risk matrix should lead to "worse than random decisions" with negatively correlated risks is clearly disproved.
We shall now investigate the effect of using fewer colors. Eliminating the smallest iso-risk contour produces the risk matrix shown below. This is identical to the risk matrix produced by Li et al. using SUA with an alpha value of 0.83.
Figure 4: Risk matrix with the number of colors reduced to five
The performance benchmarks for the matrix with the number of colors reduced to five (risks treated as independent as per Li et al) are shown below with the values from the inital analysis in parentheses.
- Points mapped accurately to risk priority level 77% (65%)
- Points with overestimated risk priority level 4% (15%)
- Points with underestimated risk priority level 19% (20%)
- PEE 11% (11%)
- PRR 1.1% (1.6%)
Eliminating one color has substantially increased the mapping accuracy, raising it from 65% to 77%, with the PEE the same as before and the PRR slightly better.
At this point, we shall stop emulating the designs obtained with SUA by Li et al. This is because, as the SUA alpha value increases, the lowest risk priority level occupies more and more of the chart area. For example, for an alpha value of 0.95, the SUA-based design given by Li et al. results in every cell being in the lowest risk priority level except for the top right cell. Such a matrix would not be at all useful in practice. Since the alpha value dictates not only the number of colors but also the risk matrix coloring pattern, it appears that the method of Li et al. cannot take into account specific iso-risk contours values for the purpose of defining thresholds between risk priority levels. Instead, the coloring pattern implies the iso-risk contour values! This makes SUA unsuitable as a basis for risk matrix design because it does not allow the designer to specify key parameters. Cox's design approach has a similar deficiency. His axioms are so restrictive as to give little choice over the coloring pattern and therefore cannot reflect the designer's choice of risk thresholds.
So now we reduce the number of colors to three by eliminating two more iso-risk contours. We chose the contours to retain so that the domain was divided into three very roughly equal areas.
Figure 5: Risk matrix with the number of colors reduced to three.
With the number of colors reduced to three, the performance benchmarks are as follows (with the values from the 6-color design in parentheses):
- Points mapped accurately to risk priority level 88% (65%)
- Points with overestimated risk priority level 1% (15%)
- Points with underestimated risk priority level 11% (20%)
- PEE 11% (11%)
- PRR 0.5% (1.6%)
The improvement in mapping accuracy due to reducing the number of colors is once again substantial. PRR has also improved. PEE is about the same.
Summary of Numerical Results
We performed a few more calculations in addition to those described above and summarize the results below.
Spearman correlation coefficient = 0
|Benchmark||6 colors||5 colors||3 colors|
Spearman correlation coefficient = -0.8
|Benchmark||6 colors||5 colors||3 colors|
Quick Risk Matrix (Premium) includes a risk matrix performance benchmarking tool to calculate several statistics:
- Mapping accuracy
- Probability of Ranking Reversal (PRR)
- Probability of Elimination Error (PEE)
The main purpose of a risk matrix is to map probability and consequence categories to risk priority levels. The mapping accuracy benchmark is an indicator of how well a matrix can do this. It is useful for comparing alternative risk matrix designs.
The Probability of Rank Reversal (PRR) is the probability that a pair of risks, with the two risk points located in different risk priority levels, is ranked incorrectly by the risk matrix. For our worked examples, the PRR was small (0.5% to 2.3%). For well-designed risk matrices, it is our experience that PRR is typically small.
The Probability of Elimination Error (PEE) is the probability that a pair of risks located anywhere in the risk matrix will be ranked incorrectly. Now, the risk matrix on its own is incapable of ranking a pair of risks when both points lie in the same risk priority level. When the two risks have equal risk priority level, the decisionmaker must break the tie by means of supplementary calculations. When calculating PEE, Quick Risk Matrix assumes that the decisionmaker would use the Geometric Center Method (explained above) for breaking ties. Thus, PEE is an indicator of how well a decisionmaker might do in ranking pairs of risks with the aid of the risk matrix and some supplementary calculations.
We consider PEE to be a statistic of lesser importance since, in the real world, risks are not chosen for elimination or reduction solely on the basis of magnitude. The cost of elimination or reduction plays an important role. If there are many small risks that can be inexpensively treated, the cumulative risk reduction may be greater than what could be achieved by treating one or two larger but more intractable risks.
In our worked examples, we found a reduction in risk matrix performance when probability and consequence are negatively correlated. The reductions were modest and not sufficient to detract from the usefulness of the risk matrix.
Cells split by iso-risk contours have ambiguous risk priority level and create mapping errors. Our worked examples illustrate that the mapping accuracy benchmark typically improves as the number of colors in the matrix reduces. This is to be expected since fewer cells will be split by iso-risk contours when the number of colors is reduced. We recommend that the number of colors is not more than needed by the organization's risk management philosophy. For many organizations, three colors may be sufficient, representing risks too high to be tolerated, risks so low that they are broadly acceptable, and risks that are acceptable provided that they have been reduced to as low as is reasonably practicable (ALARP).
Ref. 1. Cox, L.A., What’s Wrong with Risk Matrices? Risk Analysis, Vol. 28, No. 2, 2008.
Ref. 2. Xing Hong, Risk Matrix Analysis Using Copulas. Dissertation, The George Washington University.
Ref. 3. Li J., Bao C., Wu, D. How to Design Rating Schemes of Risk Matrices: A Sequential Updating Approach. Risk Analysis, Vol. 38, No. 1, 2018.