Throughout this assessment framework, ecosystem services are measured using benefit-relevant indicators. These indicators play an explicit role in conceptual diagrams and means-ends diagrams, multicriteria decision models, valuation techniques such as benefits transfer models, and various versions of matrices displaying the performance of alternative management options. Among the issues associated with development of indicators that are clearly defined and appropriate to the task at hand are (1) use of expert judgment, (2) narrative versus quantitative measures, and (3) incorporation of uncertainty in estimates.
The vast amount of practical guidance on developing environmental indicators suggests that good indicators should
To evaluate management alternatives, each objective must be represented by a measurable quantity or quality that can be observed or predicted for each alternative—for example, the dollar value of agricultural products to represent the financial consequences of a wetlands management alternative. Agencies must take care to define measures clearly: Over what area? What kinds of agricultural products? What period of time? What price-reporting service?
Similarly, to express flood frequency clearly, the agency must answer these questions: What water level measured where and when constitutes a flood event? Over what time period will the agency express frequency (monthly, annually)? Over what period might it average the number of flood events (one year, 10 years)?
Answers to questions like these depend on the decision context: Whose concerns will be included? What measures are meaningful to them?
When alternatives’ performance in achieving a particular objective might be difficult or expensive to measure—for example, in assessing the size of populations of at-risk species—proxy measures can be used instead. These measures should be relatively easy to observe and should correlate well with the measure that is really of interest. For example, if bird species diversity is a desired condition (ecological outcome), managers might index this diversity directly with a tally of species known to actually occur at a site. Alternatively, they might index the site in terms of (modeled) habitat suitability of the species of interest—whether those species occur on-site or not. The former is a direct measure but logistically expensive to collect over large regions, whereas the latter is a proxy measure but easier to estimate because the models can be implemented in a geographic information system.
The choice between direct and proxy measures is often dictated by logistics, but using proxy measures can have substantive consequences. For example, in the case of rare species, sites of known occurrences might be much more compelling than sites with potential habitat suitability, but the known occurrences might not be on sites with high predicted suitability. Choosing species occurrence as a direct measure of the existence of biodiversity while choosing suitable, but possibly unoccupied, habitat as a proxy measure could permit substitution of an unoccupied, but otherwise highly suitable, site for one that is currently occupied.
Acres of high-quality wetland habitat might be a proxy for populations of wetland-dependent species as well as a direct measure of wetland habitat. In this case, agencies should ensure that the weight put on acres of wetland relative to other measures reflects the fact that this proxy is doing double (or perhaps triple or quadruple) duty by standing in for other measures. Ways and reasons to weight measures are discussed below and in Maguire 2014.1
Synthetic indicators are increasingly popular as measures of ecological conditions that are naturally multivariate. Biotic integrity indices (IBIs) based on multiple species have been developed for aquatic macroinvertebrates, birds, wetland plants, and many other taxa. These indices are appealing because they can be weighted to emphasize particular ecological conditions (e.g., water quality) or species traits (e.g., rarity). In many cases, the synthetic indices are indirect or proxy measures (see above); for example, the aquatic macroinvertebrate IBI based on taxa sensitive to water quality is used because direct measures of water quality require lab analyses, making them difficult and expensive to collect.
Although appealing for many applications, synthetic indices can be problematic for management applications because it can be difficult to know how a specific management action might affect the index. For example, a simple response to management would require that all species used to compute an index would respond similarly to the management action. But changes in forest structure might improve habitat for some species while degrading habitat suitability for other species—and this information would be lost in a synthetic index. It is best to use synthetic indices as measures only if they can be interpreted unambiguously.
An increasing array of formal models is being developed to predict production of ecosystem services in a variety of landscapes and to capture stakeholder preferences for varying levels of service provision. In this framework, the aim is to develop quantitative estimates for benefit-relevant indicators that capture both ecological and social information. Where formal models are not available, judicious use of expert opinion can help fill in gaps, thus avoiding omission of what is important but hard to observe or predict.
When quantitative models are not available, it can be tempting to simply omit an objective and its measure, but leaving out an important but troublesome measure is tantamount to declaring it to have no importance. A better tactic is to make use of expert opinion to fill in blanks in the matrix of alternatives and performance ratings. There are established procedures for choosing experts and eliciting their opinions (see references in Gregory et al. 2012).2 Implementing these procedures is exacting, and use of a consultant experienced in them is advised. Clearly defined measures and alternatives are necessary precursors to reliable use of expert opinion.
Imprecise estimates of performance for one or more measures are typical. A common but undesirable way of dealing with uncertainty about performance is to create measurement scales that lump quantitative results into “bins,” such as 0–10 breeding pairs of a particular bird species, 11–20 breeding pairs, and so on. The problem with this tactic is that to unambiguously assign a particular result to the correct bin, i.e, to know that it belongs to the 0–10 bin and not to the 11–20 bin—the evaluator must know whether the number of breeding pairs is 10 or 11.
A better way to handle uncertainty about the number of breeding pairs is to express performance as a range of values in cells of the alternatives/attribute matrix. Instead of describing performance as falling into a predefined “bin,” such as 0–10, express performance as a range considered likely to encompass the true performance (e.g., 5–8 breeding pairs) or as a probability distribution (e.g., a mean of 6.5 with a standard deviation of 2). Then carry out the rest of the analysis by using the extremes of the range (or by sampling from the probability distribution) to see if that uncertainty affects the overall rating of alternatives.
A formal uncertainty analysis might be conducted by using a model in which a parameter (here, an indicator) is varied according to its error of estimation (e.g., plus and minus 1 standard error of the estimate). If the output of this model varies enough to alter the rank order of management alternatives, effort should be invested in refining the estimate to reduce uncertainty in the parameter estimate. Conducted for a set of indicators, this analysis would focus further effort on those parameters whose imprecision most affects the agency’s ability to discriminate reliably among management alternatives.
Some important features of an ecosystem are hard to express quantitatively. One solution is to use proxy measures, such as numbers of waterfowl seen in a day, to express the quality of a wildlife viewing experience. Another solution is to verbally define categories of performance, such as characterizing the viewing experience in terms of the observed species’ rarity or iconic stature for a particular region.
Verbal categories must be defined clearly enough that they can be used consistently by different evaluators. Terms such as “high, medium, or low” or “good, fair, or poor” are commonly used to define performance but, without direction, different evaluators are likely to use different terms to describe the same conditions. To be fully transparent and unambiguous, measures need to have (a) a category for every condition likely to result from the management alternatives (so that evaluators will always be able to find a category for any observed or predicted result), (b) nonoverlapping categories (so that evaluators will have only one category for each possible result), and (c) sufficiently clear category definitions (so that any evaluator with access to the same information as other evaluators would use the same category to describe a given result). For quality of wildlife viewing in a region where there are two iconic species of waterfowl, a categorical measure could include these categories:
Any number of either iconic species seen in a day will fall into one and only one of these six categories, and any evaluator knowing the number of each species seen will assign a category consistently.
Note that these categories treat both species equally—they do not distinguish between numbers of species A seen and numbers of species B seen. Additional verbal categories would be needed to treat the species differently.
The verbal categories used to define categorical measures should not embed expressions of relative satisfaction such as “worst,” “medium,” and “best.” These categories are ambiguous: where does “worst” stop and “medium” begin? They also assume an order of preference that may not suit all users.
In the iconic species measure above, the six verbal categories are purposely not numbered because later in the analysis such numbers are likely to be misused as expressions of relative satisfaction (e.g., the category labeled “six” is assumed to be six times as desirable as the category labeled “one”). Without scrutiny, the agency cannot know whether that expression of satisfaction is warranted and whether it suits all users. Without further investigation it does not even know how to order the six viewing categories: is it better to see both species in numbers fewer than one or only one species but in numbers greater than five? Different users might answer that question differently. Agencies must be wary of categorical scales represented by numbers or ranks. It is better to create a shorthand code for lengthy verbal descriptions of categories by using a word or a letter, rather than a number.
All of these principles for developing good measurement scales to evaluate performance of alternative management actions apply equally to measures of ecosystem services (benefit-relevant indicators). Properly developed categorical measurement scales may be especially important when evaluating the production of intangible ecosystem goods and services, such as the existence value of unique species or landscapes.
Keeney, R.L., and R.S. Gregory. 2005. “Selecting Attributes the Measure the Achievement of Objectives.” Operations Research 53(1): 1–11.
This paper outlines theory and guidelines for selecting appropriate attributes to clarify objectives in a decision process.