Improving State Evaluation of Principal Preparation Programs

Click here to download the full report:
Improving State Evaluation of Principal Preparation Programs

State leaders interested in improving their systems for evaluating principal preparation programs will ultimately need solutions tailored to their specific contexts. But conversations with state leaders and others committed to improving principal preparation surfaced a set of five core design principles on which to ground their efforts. While states may face capacity constraints (e.g., with regard to their data collection and analysis capabilities) that push against these design principles, the principles nevertheless represent a set of goals to which states can aspire and toward which they can work.

A. Structure the review process in a way that is conducive to continuous program improvement

Effective program review encourages ongoing improvement and innovation in program design and implementation in two ways. First, it provides programs with specific and actionable feedback about their practices and graduate outcomes. This feedback requires that the reviewers possess relevant expertise for making appropriate judgments, including content expertise in leadership, understanding of adult learning theory and practices, knowledge of current research about effective leadership preparation, and the ability to analyze curriculum and pedagogy. Second, an effective review system allows adequate time for improvement. To be truly focused on improvement, review cycles and processes provide programs with adequate time to make changes and assess their impact.

B. Create appropriate systems to hold programs accountable for effective practices and outcomes

An evaluation system is one of the key ways states can hold preparation programs accountable for their role in delivering high-quality preparation for aspiring principals. With approximately 700 programs currently in operation and new ones emerging on a regular basis, states need to be able to confidently make consequential decisions such as whether to approve a program; when to put a program on an improvement plan; and, in the most serious circumstances, when to rescind program approval. To generate that confidence, states can consider the following characteristics of system design: (1) understand the limitations of the indicators being tracked as measures of quality, and ensure that there is sufficient and valid information for making consequential decisions; (2) develop a clear and transparent rating system that has enough levels to meaningfully differentiate performance across programs and that captures performance and improvement over time; and (3) develop a clear and transparent process and timeline for intervening in the event of unacceptable performance.

C. Provide key stakeholders with accurate and useful information

When key consumers and partners-especially aspiring school leaders and school districts-have good information about key indicators of program quality, they can use that information to make more informed choices. For aspirants, a state evaluation system can provide concrete information about program features and outcomes (e.g., candidate learning and career outcomes) to inform enrollment choices. Ideally, systems would provide side-by-side, apples-to-apples comparisons of programs to help inform decision making. For districts, evaluation systems can provide specific information about program characteristics and candidate outcomes to guide decisions concerning formal partnerships with programs and the hiring of graduates. To meet these goals, effective evaluation systems provide high-quality, publicly available, reliable, and understandable data about programs.

D. Take a sophisticated and nuanced approach to data collection and use

Collecting and using data is central to program improvement, yet it presents significant challenges. Too often, data points can be misleading or misused. Taking a sophisticated and nuanced approach to data collection and use encompasses five related ideas.

  1. Evaluate what matters. Strong data systems include the indicators that are most germane to principal preparation. These include inputs (especially the rigor of selection into a program and the diversity of candidates), processes (especially the ways in which a program increases aspirants' leadership knowledge and skills), outputs (especially aspirants' successful placement in roles as principals), and graduate outcomes (especially contributions to student academic achievement measures, student attainment measures such as graduation, and noncognitive measures such as student engagement and social/emotional growth).
  2. Evaluate accurately. Strong data systems use the most accurate data available, and interpretations are made cautiously and with awareness of data limitations. Special caution is needed in establishing confidence in the accuracy of measures of leadership effectiveness, which are in the early stages of development. Limitations related to the reliability and validity of data from particular sources can help determine whether to use and how much to weight those data in an evaluation system.
  3. Include data that can be realistically gathered and shared. In strong systems, data are feasible to gather, efficient to report, and can be used in conjunction with other sources of information to strengthen evidence. Further, data collection is ongoing and conducted according to an established schedule. In many states, new investments in data collection and reporting will be essential to creating stable, consistent data.
  4. Consider contextual factors. Data are means, not ends. In order to make appropriate judgments based on accurate results, states will need additional contextual information. For example, a 100% admissions rate for a program could signal that the program does not have rigorous admissions-or it could be the result of targeted recruiting and effective prescreening, such that only strong applicants apply. Therefore, it is important to provide programs with opportunities to explain their results so that states can better interpret results and draw conclusions. These information exchanges about root causes can be the basis of productive conversations about program quality and improvement.
  5. Clearly and transparently communicate how results will be used. Ensure that program leaders understand which data will be made public, including how and when that will occur. Program leaders also need to understand how component parts of the program evaluation will be used to make substantive judgments and decisions about program status.

At heart, this design principle is about triangulating multiple data sources to arrive at more accurate judgments. Research has consistently demonstrated the limitations of specific data indicators as measures of quality (including, for example, a recent statement from the American Education Research Association on the use of value-added models of student achievement10). Put simply, single types of data on their own are often imperfect. But imperfect data can provide important and useful information when used appropriately. For example, it is well documented in the medical field that mammograms have a 50-60% false-positive rate and a 20% false-negative rate. Given the costs and risks of mammograms (such as exposure to radiation and stress from false-positive tests), leading medical organizations differ in their recommendations about the age at which women should start receiving mammograms and how often they should be administered. However, no one in the medical field recommends discontinuing mammogram testing altogether. To the contrary, mammograms are a critical diagnostic tool that generates data that doctors examine in conjunction with other data to make decisions about further testing and treatment. No doctor would ever use mammogram data alone to recommend surgery or make other make high stakes decisions.

What is true for physicians reading a mammogram result is also true for state education leaders looking at the graduation rate of a principal preparation program or the growth in student achievement in schools led by a program's graduates. By themselves, these indicators offer only limited insights, but combined with a deeper professional review, they can help state officials arrive at a full and accurate picture of program quality. That fuller picture can be the basis for states to make consequential decisions about program approval and can be the impetus for continuous improvement of all programs.

E: Adhere to characteristics of high-quality program evaluation

Effective state systems of program evaluation reflect best practices in program evaluation in education. The Standards for Educational Evaluation, issued by the Joint Committee on Standards for Educational Evaluation, serve as a basis for judging best practices. These standards focus on utility (i.e., the extent to which stakeholders find processes and results valuable), feasibility (i.e., the effectiveness and efficiency of evaluation processes), propriety (i.e., the fairness and appropriateness of evaluation processes and results), accuracy (i.e., the dependability of evaluation results, especially judgments of quality), and accountability (i.e., having adequate documentation to justify results).11 These standards may often be in tension with one another; for example, data gathered from first-hand observations of program processes may be of high utility but may also be restrictively expensive to gather, thus making them less feasible to include.

« Previous | Next »


10. AERA Council. (2015). AERA statement on use of value-added models (VAM) for the evaluation of educators and educator preparation programs. Educational Researcher, 44(8), 448-452.

11. See for more details on the standards.