Time for a Gold Standard of Use

By Fay Twersky | February 27th, 2012

Ask most people what “the gold standard” is and they’re likely to tell you it is a metaphor for “the best.” Indeed, when a New York Times opinion piece recently took the Obama administration to task for a certain environmental policy and the author called it “the bronze standard,” it was clear it was no compliment.

In the social sector, we talk about the gold standard as a specific kind of evaluation that has been designed to determine the impact of a program or set of activities. A gold standard impact evaluation uses randomization to determine cause and effect. It is based on the science of clinical trials and it is used to attribute change to a particular intervention. For instance, rather than just observing that students improved their reading skills, an impact evaluation allows you to know what caused that improvement: if the improvement was the result of a teacher, a new curriculum or perhaps a longer school day—or whether the change would have occurred anyway because of exogenous factors.

These evaluations are enormously important because they can provide evidence about what works to improve people’s lives; and, at least as important, what does not work. Many foundations are funding impact evaluations these days to inform national and international policy, funding decisions, and more effective practice. Seems like a sound approach, doesn’t it?

Unfortunately, it does not always work the way we plan.

Influencing policy?

First, relatively few policymakers rely on high-quality randomized studies to shape their perspectives and approaches. I recently spoke to a senior executive at a preeminent research organization dedicated to generating knowledge, largely through randomized studies, and communicating the findings to influence social policy. He explained that while evidence has been gaining ground in national policy debates, there are essentially four factors at play that inform and shape national policy: 1) ideology; 2) politics; 3) evidence; and 4) political horse trading. Evidence is only one of four factors — and it ranks third in the sequence.

When studies validate ideology and political orientations on both sides of the aisle, as they did on welfare reform – where the results highlighted the importance of work as well as the need for extra supplements to low wage work – policymakers used the evidence. But when it doesn’t validate ideology and reinforce strongly held beliefs, the evidence is often ignored, even when well communicated and well substantiated. Nationally, this is improving with new Office of Management and Budget standards that promote programs with rigorous studies of effectiveness, but still we see policymakers choosing the evidence that suits their beliefs rather than the other way around.

Influencing practice?

It can be even harder to make the evidence central to decision making in practice. A seasoned evaluator who has worked with many foundations and their grantees recently complained to me that even in fields where there is strong evidence of what the best practice is, only about ten percent of nonprofit practitioners follow those practices. Perhaps the proportion is higher than this evaluator estimates, but it is surely true that many nonprofits don’t operate in ways that are consistent with what the evidence shows. Why would this be? Why not use evidence to shape practice?

There are many possible explanations, but the two I think are most important are:

  1. A misalignment or a lack of incentives. It may be expensive to revise practice to align with the evidence – including investing in developing skills, hiring new personnel, or even changing the basic activities an organization performs. There are often no explicit incentives or funding to make the necessary adjustments.
  2. Not knowing the evidence. Many – especially small – nonprofits (and their funders) do not stay current with the literature in their fields and, as a result, may not be aware of the latest studies showing what works and what does not. There has not been an easy way to stay current on the literature and reading through reams of academic papers is not at the top of most practitioners’ or funders’ lists of priorites.

A further complication concerns an evaluator’s own belief system, which influences the interpretation of results for even highly rigorous experiments. When I asked an esteemed university-based evaluator who studies social service programs, “How often are your evaluation findings used?,” he said most of the time they were used for “tweaking.” In most of his impact evaluation studies, he further explained, there is “no effect,” meaning there is no evidence that the service being provided is helping people in any significant way. He exercises great care in the way he interprets and communicates such findings so that the evaluation results do not unduly harm what he views as essential social service programs for poor people. The primary value in this calculation is that any program serving the needs of poor people is better than no program, even if the program itself is not proven effective.

Influencing philanthropy?

In philanthropy, the evaluation picture is a bit mixed.

On the one hand, a 2011 Patrizi Associates study found that in recent years, foundations have placed less emphasis on evaluation – spending less money and reportedly having diminished influence on the foundation’s decision making. On the other hand, a recent CEP study found that 90 percent of foundation CEOs report that their foundations conduct formal evaluations of their work, often using third party evaluators to do so. Still, 65 percent of these same foundation CEOs report that it is challenging to have evaluations result in meaningful insights for their foundation. Thus, even those foundations historically oriented to evaluation find that effectively using evaluation results remains elusive.

While the challenge of using evaluation findings has perhaps led some foundations to eliminate dedicated evaluation positions, it appears to me that in the ebb and flow of foundation trends, more foundations are now creating evaluation functions as part of their operations; they are increasingly likely to link those functions to strategy and organizational learning. It is essential to seize this opportunity and plan for evaluations that are designed to purposefully support learning and decision making, as well as make better use of findings, in order to drive better actual results.

Many types of evaluations matter

The issue of “gold standard” evaluations in the social sector is controversial for other reasons. Many believe that the crowning of them as the preferred method has produced a kind of gold rush towards randomized controlled trials (RCTs), which are often narrowly focused, limited to variables that are more easily measured, and not always focused on what is most important. The fact is that there are many important forms of evaluation – performance evaluations, formative evaluations, developmental evaluations, cost effectiveness studies, case studies, and more. While well-designed RCTs can offer strong evidence as to what programmatic models work on the ground, and can inform government decisions about resource allocation, they need to be blended with other types of studies to answer the how and why questions about effectiveness. RCTs also are not applicable to evaluating things like advocacy or field building – areas of growing interest and importance in the sector.

Different forms of measurement are valuable to answer different questions at different times. If we are trying to maximize the use of evaluation results, it is essential to match evaluation methods to questions that are important to answer – that if answered, might make a difference to how one thinks about a given approach or set of activities.

It is time for a gold standard of data use.¹

More important than pursuing a single standard for measurement, is the need to inculcate the expectation that all nonprofits and funders will use data to inform how they work and the decisions they make. Sometimes those data will come from randomized studies; sometimes, they will come from performance evaluations and performance measurement to shape continual improvement and adaptation; sometimes they will come from case studies, a great vehicle for learning and improvement, especially in circumstances that require a good deal of professional judgment.

Part of increasing use of data and evaluation involves right-sizing evaluation efforts, focusing on what is important to know when, and anticipating how the results might be applied in practice. Paul Brest at the Hewlett Foundation has a saying, “Don’t kill what you cannot eat.” It is time to put our energy into consuming the information we take the time to gather, the reports we commission, and the studies we support. We should avoid asking for more data, more evaluation, and more analysis than we can actually make sense of in the time frames required for meaningful action. Rigorous methods are important, but the real gold standard is in use.

Hopeful signs ahead?

I invite readers to share what they believe are promising signs of more and better use of evaluation to inform decisions and adapt practice – in nonprofits, foundations, and even policy.

It is up to us…

Data don’t make decisions, people do. It is up to us to anticipate and meet our information needs, deploy ways to measure performance, commission evaluations with a clear purpose, and take time to use the data. Only then will we meet new vaunted expectations for a gold standard of data use.

 

¹In the interests of attribution, Jodi Nelson, my esteemed colleague from the Bill & Melinda Gates Foundation, was the first one to introduce me to this turn of phrase, “gold standard of use” and at the time, it was so resonant, I immediately said to her that I will use that phrase again.

 

Fay Twersky is a senior fellow at The William and Flora Hewlett Foundation and a member of CEP’s Advisory Board.

 


Similar Posts

10 Comments

  1. This is eminently sensible. Thank you. I would love more information on program evaluation techniques and their applications for nonprofits.

    Thank you, again.

  2. Fay,

    I’m fully behind the idea of using data in decision making. But I think your attempt to separate data generation through rigorous evaluation and use of data in decision making simply doesn’t work.

    That’s because using bad data in decision making is likely worse than using no data at all. Bad data can instill a false sense of certainty in decision makers, allowing them to ignore the limited feedback loops that exist. Examples of this are plentiful ranging from the Monty Python witch sketch in The Holy Grail to the economic history of the Soviet Union.

    I think this is one reason that using rigorous evaluation among policy makers is slow to catch on. There is a great deal of “data” out there and those using it do not have a substantially better track record than those who do not. The reason of course, is that most of the data is bad or badly interpreted.

    So as slow and painful as it is, I think we have to start with incremental steps to improving the quality of data generation before we leap ahead to encouraging data-driven decision making.

    If there is another place to start is in educating leaders in philanthropy and policy alike about data science and statistics. Such knowledge, I would hope, will help them avoid the use of bad or suspect data while instilling in them the requisite skepticism of even the data generated by rigorous evaluation.

  3. I can’t agree more with Fay—it is time for a gold standard of use of evaluative information. I have a few comments to offer in support of Fay’s argument.

    1: The emphasis on RCT’s may actually undermine efforts by non profits to know more in basic ways about their performance. The “RCT or bust mentality” allows some to throw out the baby with the bath. Many in the sector forget that the road to rigorous evaluation requires an accrual of knowledge that forms a platform for program growth and development, over time. As a senior fellow at Public/ Private Ventures (P/PV), I see few funders committed to improving the sector’s capacity to take on this developmental challenge of building knowledge diligently toward better performance. Where then are non profits to get the resources to invest in the kind of information and evaluation they need to improve their work?

    2) A related point that applies principally to funders: Do you know how to learn? The Evaluation Roundtable study Fay cited revealed just how little foundations seem to know about the implementation of their programs. If they invest in evaluation at all, less and less seems to go toward improving their understanding of how their strategies and those of their grantees play out on the ground. It seems that without this information, funders consign themselves to operating in the dark.

    3 Finally, management matters. We found in our (Evaluation Roundtable) survey that if evaluation reports to the CEO of a foundation, that foundation then uses evaluative information more in every way and with every audience. While I can point to stellar exceptions in the foundation world (individual program officers who use evaluation and other information well)—the key to more widespread good use seems to be in the hands of the foundation executive.

    Glad that you are on this topic–The Evaluation Roundtable’s next meeting is on “The Use, Misuse and Non Use of Information by Foundations. “

  4. Many thanks Fay for a thought-provoking piece. I fully agree with your call for more thoughtful use of evaluations to inform decision-making and enable best practices. However, the first challenge to overcome is for foundations and nonprofits is to “commission evaluations with a CLEAR purpose”. I have deliberately capitalized the word clear because that is of course not as easy as one would expect in this business.

    The second challenge is related to what Tim Ogden referred to: what type of data to use and the ability to discern “good” from “bad”, or even the most relevant from the least.

    Finally, simply asking the right questions is…well, not so simple.

    But I agree, of course we can’t throw in the towel. We need to use evaluations for decision-making, make mistakes, and learn…in other words, make sure we evaluate how and what we accomplished or failed to do with evaluations.

  5. Hi,
    Thanks for the comments.

    Tim and Jackie, I believe that high quality is important. I would never argue otherwise. Who wants to make decisions on “bad data”? What is true, is that no method is perfect or complete on its own.
    And unfortunately, even high quality triangulated studies, studies that we would all agree meet a very high standard of quality, are not routinely used in decision-making when the results challenge strongly held beliefs.
    Jackie, I couldn’t agree more that a crucial starting point in reaching a gold standard of use, is clarity of purpose when commissioning an evaluation. Actuality, clarity of purpose is a pretty good starting point for many pursuits, don’t you think?
    Patti, just want to underscore your extremely important point that management matters — management engagement with performance measurement and evaluation is crucial to data use. If managers take the time to ask about results, to reflect on their meaning and implications for practice, staff are more likely to follow suit.

  6. Thank you for your thoughtful treatment of this issue. RCTs undoubtedly have a (well-deserved) place of prominence in a funder’s suite of tools, but as you correctly point out, they aren’t appropriate for all charities or in all situations: they can be expensive and time-consuming; there can be ethical issues associated with their use; and in a non-laboratory setting, it is often exceedingly difficult to isolate the impact of a single variable. Getting to grips with the impact that an organisation makes is too important to let “perfect” be the enemy of “very good,” and as funders, we have a responsibility to encourage better practice as well as best practice.

    At Impetus Trust, we work with each of our portfolio charities to design an evaluation methodology that is appropriate for the individual organisation, and in more cases than not, this is not an RCT. We help our charities to develop and articulate clearly their theory of change and then help them determine measures that will enable them to track their progress against this. We commit to our investees that we will actually use each piece of data on which they report to us as part of our investment management, and we stick to this. We believe that this approach engenders trust, which enables us to work together to develop a better understanding of the impact our charities are making and where we can help them do more.

  7. Pingback: News and Events: Links to what we’re reading this week « High Impact Philanthropy

  8. Fay,

    Thanks for stimulating attention on these issues; they couldn’t be more important for the long run improvement of our sector.

    I’d like to build on Tim’s point about the inadequacies of useful information for foundations, and suggest there are at least two kinds of information that need significant improvement. One is about the nonprofit sector organizations who do the vast majority of the work that foundations invest in. Most of us in the foundation community have the rather quaint practice of gathering most of that information one organization at a time directly from the applicant each time an application is submitted. This would be rather like a potential investor in a business gathering information by asking each company to send an a la carte prospectus each time she plans to invest. The new effort Charting Impact, spearheaded by BBB Giving Alliance, Guidestar USA, and Independent Sector, is a refreshing step towards addressing this inefficient model of current practice.

    A second type of underdeveloped (or perhaps underused) information is collective evaluative data that tracks the consequences of the work of many entities, one of which is a foundation. For example, at the Missouri Foundation for Health we track the smoking prevalence of people in the state, understanding that changes are due to the work of many actors and factors beyond the Foundation, such as public health departments, cessation efforts, tobacco industry behavior, government regulation, the economy, etc. Much foundation work aims at this type of “collective impact”.

    Last, it is gratifying to see learning included as an important activity in this conversation.

    Thanks again for a thoughtful post, one that prompted additional insightful comments and obviously touched a responsive chord to those of us who keep an eye on CEPs blog.

    Bob

  9. Great conversation. Lots to think about, and encouraging on many fronts. At least from the perspective of those who are able to get evaluation into their budgets. So much funding is based on asking for reasonable amounts, getting cut back to smaller grants and then being encouraged to verify the dream anyway. This is why I’m working on trying to find new ways to let the nonprofits who do so much of the work of implementing change get access to better flows of money. New technologies may make that possible if there is not too much intellectualizing about how good or bad the evaluation is in the early growth stages. I like how the Global Impact Investment Network is trying to form a new language of evidence and effectiveness for investors, but cannot wait until there is a similar effort, funded with real dollars, to bring practitioners more into the game. And, of course, being one of them, I would love to see real grant writers be collectively schooled on what this means. Too often we fight to make the cheapest project funders might like instead of the smartest one society needs. Good writings.

Add a Comment