The White House’s Social Innovation Fund, which will support intermediary grantmaking institutions that “identify and invest in promising organizations to help them build their evidence-base and support their growth,” has been a topic of much discussion and debate. Among the concerns has been the Fund’s focus on evidence of nonprofit program effectiveness and, in particular, its focus on evidence of effectiveness being based primarily on experimental design approaches.
In an op-ed in the Chronicle of Philanthropy, Katya Fels Smyth, founder and principal of the Full Frame Initiative, argues that “No one benefits” from the Fund’s proposed approach. “Not the best ideas for helping the nation’s most vulnerable, not the taxpayers, not philanthropy, and, most important, not the communities that most need help achieving a decent quality of life.”
Her critique is rooted in an inaccurate conception of what it means to take an experimental design approach. She asserts that experimental design must “require a very narrow definition of who is being studied, and people who face multiple intertwined challenges—who are the most in need—are excluded. So, for example, if a new approach to helping homeless mothers is under scrutiny, experimental-design evaluation would exclude battered women, those with chronic health problems, or those involved in the criminal-justice system unless everyone had the same problems.”
But that is simply not the case.
An experimental design approach need not be totally removed from the complexities of the real world or prevent innovative approaches from receiving serious consideration. Many factors can be taken into account through the design and statistical analysis processes.
For example, in one of its randomized trials, Nurse-Family Partnerships, which is now a grantee of The Edna McConnell Clark Foundation, had an objective “To investigate whether the presence of domestic violence limits the effects of nurse home visitation interventions in reducing substantiated reports of child abuse and neglect.” Participants did not fit a “very narrow definition” but instead differed in a number of important ways, including the number of domestic violence incidents in the family, race of the mothers in the study, mother’s marital status, and the employment status of fathers. These differences were taken into account when the data for this study were analyzed.
My experience with foundations and nonprofits tells me that we certainly are at no risk today of over-emphasizing rigor in how assessment is approached. Nor is it the case that a greater emphasis on rigor – and on really understanding what works and what doesn’t – need crowd out other valuable approaches to getting feedback.
The promotion of experimental designs often has a polarizing effect: this has been true in the field of education with the What Works Clearinghouse, psychology’s approach to the study of social issues, and in the nonprofit community and field of evaluation as well. Proponents sometimes act as if it is the cure for all evaluative ailments; opponents sometimes act as if it is the root of all evil.
But being in support of the use of experimental designs is not necessarily in tension with supporting nonexperimental designs, case studies, and the use of qualitative data (the importance of which Bob Hughes, from the Robert Wood Johnson Foundation, wrote about in a recent CEP blog post). Any design should be selected because it is the best way to answer a particular question, and the question to be answered should be directly related to the stage of the organization or program being tested. Not all questions in the field are best answered through an experimental design approach. But some are. I see experimental design as an important tool for the field to use to understand the effectiveness of its work.
Experimental designs allow us to rule out alternative hypotheses in a way that no other designs do. When testing the effectiveness of a social program being offered to those most in need, doesn’t it behoove us to get as close to an understanding of causation as possible?
We should seek to be as confident as possible that a program has positive benefits and isn’t yielding no – or even negative – effects. Philanthropy should be looking for the models that have potential to really make a difference on our toughest social problems. The field has a moral obligation to demonstrate, to the best of its ability, that a program works before funneling significant resources to expand it.
Admittedly, these are weighty statements. Many nonprofits are understaffed and underresourced, lacking the people, skills, or funds to conduct evaluations or collect data. A small nonprofit might have an excellent innovative idea that deserves to be tried on a larger scale and tested more rigorously. This is where funders come in. They have a crucial responsibility in this.
I will take a closer look at that responsibility in my next post.
****
Ellie Buteau, PhD, is Vice President – Research at CEP








There is certainly a great deal of misunderstanding of evaluation issues. The essence of an experiment is random assignment of people or other units to receive different types of intervention. These different interventions may include something not intended to produce an effect, a control condition and the interventions or treatments should be performed in the same way for all people or units randomly assigned to receive a particular form of the intervention.
The beauty of such an experimental design is that one is entitled to conclude that if there are differences between treatment conditions larger than one would expect by statistical error alone, then by virtue of the random assignment, resulting differences between units are CAUSED by the difference in treatments. No other approach permits such strong causal inferences.
Making the population of people or units homogeneous before the intervention is not a necessary part of experimentation (only the intervention has to be applied uniformly). Making the population under study homogeneous just has the virtue of making it easier to detect a real treatment effect with a smaller and cheaper sample size.
There are many nuances to this, however. In an experiment in health care treatment of older adults for depression within primary care medical practices, we used an approach that allowed a very representative sample of older adults with very few exclusions and allowed substantial tailoring of the treatment to the history and preferences of the patients. see http://impact-uw.org/about/research.html
This flexibility enabled us to model the real population of people to whom we wished to generalize very accurately but also to test the intervention in a more realistic way where people modify the specifics of the care they choose for themselves. Still because of the randomized design we were able to conclude that it was something about offering the set of choices in the intervention condition that enabled people to reduce their symptoms of depression about twice as much as in usual care.
Of course this kind of work is very expensive – over $11M for the original trial at 8 sites around the country. It is also the case that evidence alone does not change how services are delivered – that has taken more money, patience, and leadership on the part of our grantees than I could have ever imagined.
For those interested in exploring good evaluation design further, and especially ways to make sure qualitative information is captured in useful and trustworthy ways, Philanthropy Action is hosting a conference call with evaluation expert David Roberts of New Dominion Philanthropy Metrics on Feb. 22nd at 1pm/10am. Details are here:
http://www.bit.ly/bZba4C
Just as CEP’s Grantee Perception Surveys enable philanthropy to see through the fog of speculation about the successes and shortcomings of foundations’ support for grantees’ work, smart evaluations pierce the fog that often conceals what works, what doesn’t work, and why. Experimental design evaluations, along with other rigorous assessments of results, greatly enrich our understanding of how social innovations can make a difference in people’s lives. Some innovations work as intended, and some don’t – that’s inherent in the innovation process – and innovations will produce very limited benefits for society unless we take pains to map their results.
Are all experimental design studies worthwhile? Of course not; like any other tool, experiments can be designed and executed well, or poorly (meaning too soon, too narrowly, too crudely, with too few people, or even unethically – all faults that experienced users take pains to avoid). That’s no reason to turn our backs on experiments – it’s a reason to do them well!
Too often, legitimate concerns about evaluation aren’t paired with the equally serious question, What will happen if we DON’T evaluate social innovations? History shows that the answer is clear: Our best ideas and most promising reform ideas will be dismissed and discarded with caustic anecdotes and the observation that “there’s really no objective evidence of real results here.” When we gather reliable evidence about results, we’re building a shared heritage of better outcomes for more people.
Ellie’s observation that philanthropy is “certainly at no risk today of over-emphasizing rigor” points to the biggest challenge facing philanthropy and social innovation today. It’s this: Are we serious about focusing our work on results and using the best possible tools to get results, including smart progress measures tied to outcomes, rigorous evidence of effectiveness, and deep probes into the practical skills and problem-solving that overcome the barriers to change?
Ellie’s observation that philanthropy is “at no risk today of over-emphasizing rigor” is right on target. There are two other critical points in this discussion: 1) experimental design is not the only way to achieve rigor and 2) the evaluation methodology should be a good fit with both the nature of the work being evaluated and the evaluation question being asked.
Let me tackle the second point first. Work that seeks community level change typically includes participation by community members in the design and oversight of the work. A hands-off evaluation would not be a good fit for work in which community ownership is a goal. A participatory evaluation, in which the evaluation actually becomes part of the change effort, is probably more suitable. Does this mean that it can’t be rigorous? No, but it probably precludes a true experimental design. Using evaluation to provide high quality, real-time feedback may be a better use of foundation dollars than using them to fund a true experimental design.
The mission and priorities of the foundation are key to understanding the appropriateness of different designs, too. A mission like, “improve the quality of life in this community” probably doesn’t call for a lot of experimental design, while “increasing college enrollment for economically disadvantaged youth” could use experimental designs to demonstrate which methods are most successful. The underlying evaluation questions are different: are we contributing to a higher quality of life vs. what method of increasing access is most effective?
In terms of rigor, social science researchers (and community psychologists as exemplars) have developed many other research designs that help rule out alternative explanations for observed results. These quasi-experimental designs don’t use random assignment (the hallmark of a true experimental design), but use other methods of comparing a group that has received some intervention to those that have not experienced an intervention. Many of these designs are very appropriate for the types of work that foundations fund. The use of comparison (not control) groups and statistical analysis that accounts for differences is one. The use of interrupted time series is another one: data on an outcome (like teen pregnancy) is reviewed for the same group (community, school, etc.) over time. If there is an change in the rate that coincides with the intervention, that is some evidence of a causal relationship. Campbell and Stanley described many of these alternative designs in their classic work. Yin described using multiple case studies as another example of rigorous non-experimental design.
The upshot: use the most rigorous method available that is suitable to the question being asked.
With respect Ellie, I think you are missing out on Katya’s larger point: which is that experiemental designs (RCTs) are limiting in the nonprofit context because they can only look at one well-defined intervention at a time. They have a time and a place, but that is rarely in the nonprofit world.
In your Nurse Family Partnership example, you are absolutely correct in saying that demographic or other factors can be inluded in statistical analyses. However, that is because the experimental design is focused on a single well-defined intervention (Nurse Family Partnership).
In Katya’s writing, she talks about nonprofits that deliver comprehensive or holistic services. Think of it this way: a very high risk/need individual might need all of the following – Nurse Family Partnership as an intervention, and substance abuse treatment, and job training.
In this case an experimental design is inappropriate because you have multiple different interventions and you’ll never be able to determine causation. (In other words, you observe statistically signficant outcomes, but you can’t guess at causation because there are multiple combinations of interventions).
This isn’t saying that experimental designs don’t have their place. Rather it is saying that in the real world, many non-profits offer a variety of services and aren’t just focued on a single service.
In my experience, 99% of the programs that have utilized experimental designs have originated in academic settings, and are focused on one problem area or set of risk characteristics.
The danger comes when foundations, funders, and governments think that experimental designs are the only (or best) way to do evaluation. That simply isn’t true. Especially when a nonprofit is providing multiple different interventions designed to achieve multiple outcomes with a population. In these cases, most evaluators would agree that an experimental design is the LAST thing that should be used.
Pingback: Rigorous evaluation gains traction in DC policy circles | peacebuilding surveys
I could not agree more with Teri Behrens when she advises: “use the most rigorous method available that is suitable to the question being asked.”
Isaac Castillo, you say you believe that experimental design has “a time and a place, but that is rarely in the nonprofit world.” And the example that you use of why this design is not appropriate in the sector is the following: “a very high risk/need individual might need all of the following – Nurse Family Partnership as an intervention, and substance abuse treatment, and job training. In this case an experimental design is inappropriate because you have multiple different interventions and you’ll never be able to determine causation.”
But that simply is not the case, and it is important to clarify that. An example pulled directly from a published research article reporting out on a RCT for Nurse-Family Partnership reports that women in the study were using other services – they were not unable to use other services because they were taking part in a RCT for NFP, the intervention that was under study: “Women were interviewed at 36 weeks of gestation in the study office to assess their health-related behaviors, including use of psychoactive substances and use of ancillary preventive services (e.g., childbirth education and mental health) and emergency services (emergency housing and food banks).”
As Christopher Langston notes in his comment, “Making the population of people or units homogeneous before the intervention is not a necessary part of experimentation (only the intervention has to be applied uniformly).”
So, as I wrote in my original post, I do believe that in her op-ed, Katya Fels Smyth fundamentally mischaracterizes experimental design when she asserts that it requires “a very narrow definition of who is being studied, and people who face multiple intertwined challenges—who are the most in need—are excluded. So, for example, if a new approach to helping homeless mothers is under scrutiny, experimental-design evaluation would exclude battered women, those with chronic health problems, or those involved in the criminal-justice system unless everyone had the same problems.”
As Ed Pauly notes in his comment, “Are all experimental design studies worthwhile? Of course not; like any other tool, experiments can be designed and executed well, or poorly (meaning too soon, too narrowly, too crudely, with too few people, or even unethically – all faults that experienced users take pains to avoid). That’s no reason to turn our backs on experiments – it’s a reason to do them well!”
Thanks to all for the thoughtful and thought-provoking comments.