Measuring Impact - Duignan's Impact Evaluation Feasibility Check*

It's often really hard to prove your impact on high-level outcomes

For most real-world programs and organizations, it's really hard to prove that you're having an impact on high-level outcomes. Most people make the mistake of thinking that if you have just measured that indicators of high-level outcomes are improving (e.g. less violence, more tourism, better social outcomes), you've established that you're the one who has improved them. 

For indicators that are controllable by your organization or program (i.e. you are the only thing influencing them), just measuring them does actually establish that you've improved them. Controllable indicators tend to be measures of lower-level steps rather than higher-level outcomes. Many higher-level outcomes are influenced by a number of factors in addition to the work of your organization or program. Simply measuring that such high-level indicators have got better is not, in itself, enough to establish that it was only your work that improved them. This is because it could be that other factors may have improved them. It is usually the case that many of the most interesting high-level indicators you work is focused on will be non-controllable indicators. 

You need to turn to technical impact evaluation designs

The only way you can prove that you've made a difference to high-level indicators where they are non-controllable is to complement your indicator measurement work (which is called monitoring) with specific impact evaluation work. Impact evaluation consists of specific studies aimed at teasing out what's actually causing high-level outcomes to occur. It's the way that you can prove that it's your program or organization, rather than other factors, which is causing high-level outcomes to improve. 

The problem for program and organizational staff attempting to prove that their program is actually improving high-level indicators is that when they turn to the area of impact evaluation, they quickly realize that in many cases it can be technically complicated.  

Avoiding psuedo-impact evaluation studies

There are two traditional approaches to the problem of impact evaluation being technically hard. The first is to just ignore the technicalities and insist that anyone can do impact evaluation. You then suggest that program staff just draw together whatever information they can find to show that their particular program 'works'. In effect this is watering down the definition of the term impact evaluation. The danger of this is that the term can be so watered down that it becomes almost meaningless. It produces what can be called pseudo-impact evaluations. There are impact evaluations which you could drive a bull-dozer through because of their methodological weaknesses. 

The other traditional approach is at the other extreme. It is where you insist that everyone has to do technically 'robust' impact evaluations (e.g. randomized experiments) so that they can robustly establish that their program has made a difference to high-level outcomes. Demanding this of all programs is unrealistic and unaffordable.

The Duignan approach deals with the impact evaluation problem differently from these two traditional solutions. First, it never assumes that impact evaluation is appropriate, feasible or affordable to do on any particular program, organization or intervention. Second, even when impact evaluation is appropriate, feasible and affordable, the Duignan approach does not insist that it always should be done. This is because even if it is feasible, it might not be a good use of scarce evaluation resources. For instance, the same intervention might have been evaluated before in similar circumstances. Evaluation resources should not be allocated on a program by program basis but rather as part of a sector-wide assessment of what the knowledge needs are for a particular sector. 

Assessing the feasibility of the seven major types of impact evaluation design

The Duignan approach works by going through seven major impact evaluation design types and assessing each of them for their appropriateness, feasibility, affordability and credibility with stakeholders when applied to the particular program, organization or intervention being subject to impact evaluation. If none of them are appropriate, feasible, affordable and credible to key stakeholders then in the Duignan approach that is simply the reality of the situation you are dealing with. There's no reason why program staff should feel that they are failures if they cannot come up with any appropriate, feasible, affordable and credible way of doing impact evaluation. The ease of impact evaluation varies widely amongst different types of program in different settings with different levels of resources available for doing impact evaluation. 

The Duignan approach does not dumb-down the impact evaluation choices available to you but rather lists the seven major design types and suggests that you assess which of them are appropriate, feasible and affordable. Some people reading this may need some assistance to do the impact evaluation feasibility analysis, others will be able to work their way through it at a high-level and call in technical expertise for just parts of it. Best practice in using the Duignan approach is for funders to provide expert advice to providers as to which of the seven major impact evaluation design types are likely to be appropriate, feasible, affordable and credible in the case of particular types of program, organizations or interventions. 

Some designs are relatively easy to set up and do. Such designs could be undertaken by a wide range of generic market research companies or those with similar experience. An example of one of these designs is Impact Evaluation Design Type 6: Key Informant Judgment Design. In this design you just ask people who are likely to know whether they think  the program or organization made a difference to high-level outcomes. Others, like Time Series Designs or Regression Discontinuity Designs are much more technical and will usually require that you ask someone with technical evaluation expertise whether or not they may be applicable to your program or organization's work. 

The seven major impact evaluation design types are listed below with an explanation of each of them. The idea is that for any program or organization, you go through these design types and see if any of them are applicable in the case of your particular program or organization.

1. True Experiments

In randomized experimental designs, people, organizations, regions or other units are assigned randomly to an intervention and control group. They are sometimes called Randomized Controlled Trials (RCTs). The intervention group receives the intervention and its outcomes are compared to those for the control group which does not. Because of the random assignment, all other confounding factors which may have produced the outcomes can be ruled out as unlikely to have created the observed improvement in high-level outcomes. 

2. Regression Discontinuity Design

In regression discontinuity designs, people, organizations, regions or units are quantitatively ranked or distinguished. For example they might be ranked from those with the lowest pre-intervention outcome scores to those with the highest. Then an intervention is implemented above or below a cut-off point in that ranking. For our purposes here we can think of them being graphed in order of a ranking. Any improvement in the intervention group that is effective should appear on the graph at the cut-off point between those units which received the intervention and those which did not. One advantage of this design over a true experiment is that it is often seen as a more ethical design because the treatment is given to those most in need (i.e. those below the cut-off point).

3. Time Series Designs

In time series designs, a number of measurements of an outcome are taken over a period of time. Then, an intervention is introduced at a specific point in time. If it is successful, it is expected that there would be an improvement at exactly the time when the intervention was introduced. Because there have been a series of measurements over time, it is possible to look at the point in time when the intervention was introduced and ask the question as to whether an improvement is shown at that point in time. 

4. Constructed comparison group Designs

In constructed matched comparison group designs, the attempt is made to identify or create a comparison group which is not receiving the intervention. This group is then used to compare outcomes with the group which has received the intervention. For instance, one might find a similar community to the intervention community. In some cases, called propensity matching, statistical methods are used to work out what 'is likely to have happened' to a particular type of case (on the basis of statistics from many cases which did not receive the intervention) if they did not receive the intervention.

5. Exhaustive Causal Identification and Elimination Designs

In exhaustive alternative causal identification and elimination designs there needs to be a good way of measuring whether or not outcomes have occurred. Then all of the alternative explanations as to why outcomes might have occurred need to be detailed. Alternative explanations are then eliminated by logical analysis, and using any empirical data available. Each is assessed to see whether it is credible that they might have caused the observed improvements in outcomes. If this can be done successfully and all alternative explanations eliminated, it leaves the intervention as the only credible explanation as to why outcomes have improved. These designs differ from Time Series Designs in that they do not require a large number of observations over time.

6. Expert and Key Informant Judgment Designs

In expert judgment designs, an expert, or an expert panel, is just asked to make a judgment as to whether, in their opinion (using whatever method they usually use in making such judgments) they believe that the program has had an effect on improving high-level outcomes. This type of evaluation design is sometimes called a 'connoisseur' evaluation design drawing on an analogy with connoisseur judges such as wine tasters. In key informant judgment designs, key informants (people who are likely to be in a position to be knowledgable about what is happening in a program and whether it impacted on high-level outcomes) are asked to make a judgment regarding whether they think that the program affected high-level outcomes (using whatever method they want in making such judgments). This type of impact evaluation design is often seen as less robust and credible by some stakeholders than the designs listed above. 

7. Intervention Logic (Program Theory/Theory of Change) Based Designs

In intervention logic designs, the attempt is first made to establish a credible 'intervention logic' (program theory/theory of change) for the program or organization. This logic sets out the way in which it is believed that lower-level program activities will logically lead on to cause higher-level outcomes (this can be done in the form of a DoView Outcomes Model more). This logic is then endorsed either by showing that previous evidence shows that it does work in cases similar to the one being evaluated, or by experts in the topic endorsing the logic as being a credible logic. It is then established that lower-level activities have actually occurred (relatively easy to do because they tend to be able to be measured by controllable indicators) and it is then assumed (but not proven) that they did in this particular instance, in fact, cause higher-level outcomes to occur.

Using Duignan's Framework

In practice, Duignan's Check is used by going through each of these seven major impact evaluation design types and assessing the appropriateness, feasibility, affordability and credibility of each of them.

Below is an example of part of the analysis of the set of seven impact evaluation major design types for impact evaluation of a new national building regulation regime designed to improve the performance of new buildings in a country.

For a simple humorous article which goes through each of the impact evaluation designs and illustrates them see here.

Duignan's Impact Evaluation Feasibility Check for a New National Building Regulatory Regime

An evaluation plan for a national new building regulatory regime was developed (it is available here). The new building regulatory regime was introduced as a consequence of the failure (due to leaking) of a number of buildings under the previous national building regulatory regime. The analysis of the possible impact evaluation designs is given below:

True experimental design

NOT CONSIDERED FEASIBLE. This design would set up a comparison between a group which receives the intervention and a group (ideally randomly selected from the same pool) which does not. For ethical, political, legal and design compromise reasons it is not possible to implement the interventions in one or more localities while other localities (serving as a control group) do not have the intervention. Apart from anything else, statutory regulation could not be imposed on only part of the country. In addition, there is a major impact evaluation design compromise problem given the practical and political importance of having a high standard of new building work in one locality but not in another. It is likely that compensatory rivalry would reduce any difference in outcomes between the intervention and control group.  Compensatory rivalry is where the control locality also implements an intervention which is being evaluated because it also wants to achieve the outcomes which are as important to it as to the intervention locality that is receiving the intervention.

Regression-discontinuity design

NOT CONSIDERED FEASIBLE. One instance of this design could graph those localities which could potentially receive the intervention on a measurable continuum (e.g. the quality of buildings in the locality). The intervention would then only be applied to those localities below a certain cut-off level. Any effect should show as an upwards shift of the graph at the cut-off point. In theory it would be possible to rank local regions in order of the quality of their new building work and if resources for the intervention were limited it would be ethical to only intervene in those with the worst new building work occurring and hence establish a regression discontinuity design. However, the political, legal and design compromise (as in the above experimental design) mean that a regression-discontinuity design does not seem to be feasible in this instance.

Time-series design

NOT CONSIDERED FEASIBLE. This design tracks a measure of an outcome a large number of times (say 30) and then looks to see if there is a clear change at the point in time when the intervention was introduced. This design would be possible if multiple measures of new building quality were available over a lengthy (say 20 year) time series which could then continue to be tracked over the course of the intervention. However this design has the design compromise problem that there is another major factor – which can be termed the ‘crystallization of liability’ which is occurring at the same time as the introduction of the new building regulatory regime. The crystallization of liability is a consequence of all the stakeholders now becoming aware of the liability they can be exposed to due to failure of many buildings and the attendant liability claims which have arisen from them. It should be noted that this crystallization, of course, does not mean that any available time series data cannot be used as a way of tracking the not-necessarily controllable indicator of quality of new building work over time. It is just that any such time series analysis would be silent on the question of attribution of change to the new building regulatory regime.

Constructed matched comparison group design

NOT CONSIDERED FEASIBLE. This design would attempt to locate a group which is matched to the intervention group on all important variables apart from receiving the intervention. This would require the construction (identification) of a comparison group not subject to a change in its regulatory regime, ideally over the same time period as the intervention. Since the new building regulatory regime is a national intervention such a comparison group will not be able to be located within the country in question. It is theoretically possible that one or more comparison groups could be constructed from other countries or regions within other countries. However discussions so far with experts in the area have concluded that it is virtually impossible for a country or region to be identified which could be used in a way that meets the assumptions of this design. These assumptions are: that the initial regulatory regime in the other country was the same; that the conditions new buildings are exposed to in the other country are similar; that the authorities in the other country do not respond to new building quality issues by changing the regulatory regime themselves; and that there are sufficient valid and reliable ways of measuring new building quality in both countries before and after the intervention. It should be noted that while some of these assumptions may be met in regard to some overseas countries, all of them would need to be met for a particular country to provide an appropriate comparison group.

Causal identification and elimination design

CONSIDERED LOW FEASIBILITY. This design works through first identifying that there has been a change in observed outcomes and then undertaking a detailed analysis of all of the possible causes of a change in the outcome and elimination of all other causes apart from the intervention. In some cases it is possible to develop a detailed list of possible causes of observed outcomes and then to use a ‘forensic’ type process (just as a detective does) to identify what is most likely to have created the observed effect. This goes far beyond just accumulating evidence as to why it may be possible to explain the observed outcome by way of the intervention and requires that the alternative explanations be eliminated as having caused the outcome. This may not be possible in this case due to the concurrent crystallization of liability, discussed above, which occurred in the same timeframe as the intervention. It is likely that this cause is significantly intertwined with the intervention in being responsible for any change that occurs in new building practice and that it will be impossible to disaggregate the effect of the intervention from the effect of crystallization of liability. A feasibility study should be undertaken to make sure that this design is not feasible.

Expert judgment and key informant judgment design

CONSIDERED HIGH APPROPRIATENESS, HIGH FEASIBILITY, HIGH AFFORDABILITY, LOWER CREDIBILITY. This design consists of asking a subject expert(s) to analyze a situation in a way that makes sense to them and to assess whether on balance they accept the hypothesis that the intervention may have caused the outcome. One or more well regarded and appropriate independent expert(s) in building regulation (presumably from overseas in order to ensure independence) could be asked to visit the country and to assess whether they believe that any change in the new building outcomes is a result of the new building regulatory regime. This would be based on their professional judgment and they would take into account what data they believe they require in order to make their judgment. Their report would spell out the basis on which they made their judgment. This approach is highly feasible but provides a significantly lower level of certainty than the above impact evaluation designs. If this design is used then the evaluation question being answered should always be clearly identified as: In the opinion of an independent expert(s) has the new building regulatory regime led to an improvement in building outcomes? There are obvious linkages between this design and the causal identification and elimination design above and the feasibility study for that design should also look in detail at the possibilities for the expert judgment design. A key informant judgment design is also highly feasible. This design consists of asking key informants (people who have access by virtue of their position to knowledge about what has occurred regarding the intervention) to analyze a situation in a way that makes sense to them and to assess whether on balance they accept the hypothesis that the intervention may have caused the outcome. A selection of stakeholder key informants (key informants are people who have knowledge of what has occurred in an intervention) could be interviewed in face to face interviews and their opinions regarding what outcomes can be attributed to the new building regime could be summarized and analyzed in order to draw general conclusions about the effect of the intervention. This could be linked in with an expert judgment and a causal elimination design as are described above.

Intervention logic designs

CONSIDERED HIGH APPROPRIATENESS, HIGH FEASIBILITY, HIGH AFFORDABILITY, LOWER CREDIBILITY. In this design the logic of how it is thought the intervention will work is spelt out (for instance in a DoView outcomes model more). This is then validated against existing evidence and its credibility assessed by experts in the field. If it is deemed to be a credible logic, it is just established that the lower levels of the outcomes model have occurred and then it is assumed that their occurrence in this instance has led to the higher-level outcomes occurring

Additional examples of this type of analysis are available here

The Duignan Impact Evaluation Feasibility Check can be used within DoView Visual Monitoring and Evaluation Plans. These are more quickly built and accessible evaluation plans which are build around a visual outcomes model build in DoView Outcomes and Evaluation software. Information on how to build DoViews is here. Information on how to build DoView Visual Monitoring and Evaluation plans is here. Download the DoView free trial now.

Anyone can use the above material, with acknowledgment, when doing evaluation planning for their own organization or for-profit or not-for-profit consulting work just with acknowledgement. However you can't embed the approach into software or web-based systems without our permission. If you want to embed it in software or web-based systems please contact general@doview.com.

*Reference to cite in regard to this work: Duignan, P. (2009). A concise framework for thinking about the types of evidence provided by monitoring and evaluation. Australasian Evaluation Society International Conference, Canberra, Australia, 31 August – 4 September 2009.