Learning From Leadership: Investigating the Links to Improved Student Learning

Click here to download the full report:
 Learning From Leadership: Investigating the Links to Improved Student Learning

Appendix A


As proposed and undertaken, our study was large and complex. The specifics of sampling, instrumentation, data collection, coding, and analysis evolved from what we proposed to the Wallace foundation in 2003. For the project as a whole, we collected two rounds of survey data from principals and teachers and three rounds of site-visit data from schools and districts, including classroom observations and interviews with teachers and building and district administrators. We also interviewed state-level education leaders in two rounds. We sampled states to ensure variation in geography, student demographics, state governance for education, curriculum standards, leadership policies, and accountability systems. We sampled districts to achieve variation in size and demographic diversity. We sampled schools to ensure variation in school level and demographic diversity. We obtained student achievement data for literacy (reading or language arts) and mathematics from scores on the states‘ tests for measuring Adequate Yearly Progress (AYP) mandated by the No Child Left Behind Act of 2002 (NCLB).

The Sampling Plan

Our sampling of states, districts, and buildings went through three stages. First, in our response to the Wallace Foundation‘s RFP, we proposed a sampling plan that led to a schematic "proposed sample." Second, we undertook the actual state, district, and building sampling with a modified sampling plan, and it led to the "selected sample." Finally, following our district and building recruitment plan, we gained our "achieved sample."

The proposed sample

We proposed a stratified random sampling plan for survey data collection that would yield nine states, five districts per state, and four schools per district. We proposed to sample three states from each of three regions—the East Coast, the South, and the Midwest and West. We proposed that the 45 districts would be stratified by size and level of student poverty/diversity and would be a uniform distribution of districts across these variables (Table A.1). We show our criteria for classifying districts in Table A.2.

Table A.1
Proposed District Sample: Size By Poverty / Diversity

District Size


















Table A.2
District Classification Criteria




Number of students

Percent of students
qualifying for free or reduced lunch

Percent of white students

Large 25,000 and above
Medium 2,500 -24,999
Small 600* -2,499

High 66% or higher
Mid 18% -65%
Low Less than 18%

High Less than 18%
Mid 18% -65%
Low 66% or higher

*Six hundred was our lower limit for district recruitment purposes. Although 36% of school districts in the U.S. had fewer than 600 students, they accounted for just 3% of the student population.

We proposed that the 180 schools would be a uniform distribution across the poverty/diversity variable and building level (Table A.3).

Table A.3
Proposed School Sample: Level By Diversity / Poverty

School Level









Middle School




High School




The state sample

In the RFP under "Site Selection," the Wallace Foundation made it clear that it expected the research to be undertaken in some of the states and districts that were then involved in their funded leadership development efforts, especially in the 15 states in the SAELP (State Action for Education Leadership) consortium and the 12 LEAD districts (Leadership for Education Achievement in Districts) in 12 of the SAELP states. Wallace did not require bidders to include all of the sites they funded and did encourage bidders to consider studying sites outside of the funded pool. In our proposal, we showed an example selection of nine states from the three regions that included four SAELP states. When we actually sampled states, we agreed to aim for four Wallace funded states. We decided to restrict the selection of the four to those where funding was at the state level (SAELP) and at the district level (LEAD). We thought that limiting the Wallace funded sample to four would allow our total sample to not be overly biased by the presence of external funding for leadership development. We also wanted to ensure that the final sample of states contained adequate variation on a range of variables that we believed were potentially relevant to understanding leadership at the state and local levels, and that would be consistent with variation across the country.

The state sampling process

  • We divided the states into geographic quadrants—East, South, Midwest, and West (Table A.4).306 In deciding where to draw the lines of these quadrants we took into account historical conventions, geography, and population density. The purpose of establishing the quadrants before random sampling was to ensure that we got a reasonable distribution of states across the country.
  • We assigned each state a separate number (1 to 48) from a computer generated random sequence.
  • We sorted the states in each quadrant in ascending order by their randomly generated number.
  • We selected the first SAELP and LEAD funded state from the list for each quadrant.
  • We selected the second SAELP and LEAD funded state for each quadrant as an alternate.307
  • We selected the first three non-SAELP funded states within each quadrant to complete the basic sample pool.308
  • We selected the next two non-SAELP funded states from the list within each quadrant to provide randomly generated alternates to the original pool.

Following our state sampling process, we formed a basic pool of 16 states with the first selected SAELP and LEAD funded state and the first three non-SAELP funded states from each quadrant. We next examined the variation on the variables we were concerned about: poverty, racial/ethnic diversity, number of school districts, per pupil spending, state board governance structures, principal certification requirements, principal shortage levels, National Assessment of Educational Progress scores in reading and mathematics, minority achievement and graduation rate gaps, state accountability systems, and number of charter schools. Drawing these data from national sources and state websites, we constructed a matrix that enabled us to display and analyze the variability within our randomly generated 16-state sample.

We were satisfied with the range of variation achieved with our initial sample of the eight states comprised of the first SAELP and LEAD funded state and the first non- SAELP funded state, but we identified a few variables for which the degree of variation could be enhanced with the selection of the ninth state. We chose the ninth state strategically from among the remaining states in the initial pool because it best complemented the variation obtained with the first eight.

Table A.4
Forty-eight contiguous states divided into quadrants

EAST (11)

WEST (11)

New Hampshire

New Jersey
New York
Rhode Island


New Mexico


SOUTH (14)


North Dakota
South Dakota


North Carolina
South Carolina
West Virginia

Before going further, we reported the selection criteria and the names of the selected nine states to our program officer at the Wallace Foundation. The program officer had a few questions about the selection and asked for clarifications before presenting our state selection to the senior leadership team in the education division at Wallace. Their approval of our selected sample came a few days later.

We did not "recruit" the states, as there is no person who can say yes or no to a request to participate for the state. We did, however, write a one-page letter to the highest ranking education officer of each state telling him or her about the study and that their state had been randomly selected.309 We also invited him or her to consider taking part in the state leader interview component of our investigation. We attached a more detailed description of the project and a consent form to participate in an interview.

District and School Sampling

The district sample

From the website of the National Center for Education Statistics (NCES, we downloaded their most current demographic data for all districts in each of the nine states in the selected sample. The uniform distribution of districts across size and poverty/diversity we show in Table A.1 was not possible with our selected state sample because of the demographic realities in the nine states. For example, a majority of small districts are rural, and rural communities tend to have less racial and ethnic diversity in some parts of the country. Similarly, it is much easier to find low poverty small districts than low poverty large districts: there were only seven low poverty, large districts in the nine selected states, but all seven were in one state. Even so, our nine-state selected sample fairly captured differences in student enrollment across the 48 states. We had two high enrollment states (1,500,000 or more students), four medium enrollment states (500,000 to 1,500,000 students) and three low enrollment states (fewer than 500,000 students). Our sample included states that had low minority populations, states that had high nonwhite minority populations in a single race/ethnicity category, and states that had large but more diverse nonwhite minority populations.

We then generated an initial sample pool of 80 districts (about nine per state) with size, poverty and diversity in mind (Table A.2). In keeping with our decision to sample five districts per state, we then ensured that in every state the selected sample reflected variation on all three variables. We initially selected310 at least one large, medium, and small district from each state. In terms of poverty, we selected districts representing all three levels where possible, if not, then two. We also selected for high, medium, and low diversity districts in all states, ensuring that at least two if not all levels were represented. The size, poverty and diversity breakdowns of the selected sample were:




14 Large

17 High

10 High

16 Medium

20 Medium

19 Medium

15 Small

8 Low

16 Low

We agreed that the variation of the selected sample provided a best approximation of what we were looking for, but it was not a replicating sample in each state. We were satisfied with the sample for the kinds of analyses we envisioned doing.

Generating a list was easy compared with recruiting the selected districts to participate in the study. To recruit the districts, we first sent superintendents a letter seeking their participation and followed up the letter with telephone calls. In the letter to the superintendents, we told them about the study and that their district had been randomly selected to participate. To participate, districts had to agree to be part of our survey data collection. For their participation, we offered the district an incentive of a one-time stipend of $500. We informed them that in our survey data collection we would be inviting principals, assistant principals, and teachers to respond to a written survey about leadership policy and practices that bear on teaching and learning; that we would conduct the principal and teacher surveys in four schools per district representing elementary and secondary schools; and that we would be conducting a second round of surveys in the final year of the study (2008). We also recruited two of the districts per state as site visit and survey districts. To these 18 districts, we offered the $500 incentive and a one-time stipend of $200 to each school visited (typically two buildings per district). Anticipating that some superintendents would ask with which schools we proposed to work, we were ready with a proposed selection (see discussion of the school sample below).

Recruitment was slow going. The initial samples of eight or nine districts per state were used up as the refusals came in. The most frequent refusal claim was that they were "too busy." We suspected that some were afraid of having their "leadership problems" become public knowledge. In the face of that vulnerability, our assurances of anonymity were not enough to encourage risk taking. When the initial sample of districts was used up before getting five to agree to participate, we went back to the data base and sampled further, sent letters, and followed up with calls. The districts in one state were particularly unwilling or unresponsive. All but one of the first eight selected districts in this southern state refused to participate, some even refusing to reply. We despaired of ever scheduling a site visit. After considerable deliberation, we decided to abandon the state and go to the first alternative in the state sample. We essentially lost four months of recruiting effort. Unfortunate too was that by that time, we had already conducted eight telephone interviews with senior education officials in the state. The alternative state was a reasonable match in terms of preserving the sampling balance we had initially achieved. The alternative was Louisiana, and the recruitment was going well enough when Hurricane Katrina struck in late August, 2005. By mid-September we concluded that with the devastation in much of the state, we had to give up Louisiana. In its place we took the next sampled alternate in the South, North Carolina. In the end, the achieved state sample was New Jersey and New York (East), Missouri, North Carolina, and Texas (South), Indiana and Nebraska (Midwest), and New Mexico and Oregon (West).

The achieved district sample. The achieved district sample reflects the challenges and realities of recruiting school district participation in research studies of this sort. In all of the states, some if not most of the originally selected districts declined to participate for one reason or another. Only 21 of the original 45 selected sample districts (47%) agreed to participate and were in the achieved sample. We replaced districts that refused with others that matched the size, poverty, and diversity profiles of the original districts to the extent possible. In one state, for example, we recruited 14 school districts before getting agreement from five for the study. This was typical for most states, but in some the recruitment process was even more difficult: In two states, we only were able to recruit four districts each for an achieved sample of 43 rather than 45 districts. The size, poverty and diversity breakdowns of the achieved sample were:




11 Large

9 High

7 High

19 Medium

26 Medium

22 Medium

13 Small

8 Low

14 Low

Eighteen (two per state) of the 43 districts in the study agreed to be site visit districts. The size, poverty, and diversity breakdowns of the site visit districts sample were:




6 Large

4 High

3 High

6 Medium

10 Medium

8 Medium

6 Small

4 Low

7 Low

What appears to be an even distribution by size of site visit districts mask the actual variability across the nine states:

  • Four states had one small and one large site visit district
  • Two states had one medium and one large site visit district
  • Two states had one small and one medium site visit district
  • One state had two medium site visit districts.

The building sample

We undertook the building sample as soon as we had the selected state and district samples. From the NCES website, we downloaded their most current demographic data for all buildings in each of the 45 districts in each of the nine states in the selected sample.

The building sampling process

  • We wanted regular schools, so we did not consider, that is, allow in the sampling data base, service schools such as arts, technical, special education, alternative, evening, hospital, home bound, incarcerated, and so on.
  • We did not consider buildings of all one grade.
  • We did not consider buildings with all grades, K – 12, in a single building.
  • We did not consider charter or magnet schools.
  • We did not knowingly consider primary only centers.
  • All sampling was within a state.311
  • Our sampling ideal was 20 per state, which was plus/minus 4 per district, for 180 schools total (Table A.3) but we decided to sample five schools per district (25 per state, 225 total), which would allow for a 25% cushion against likely refusals to participate even though we had the superintendent‘s blessing in each district prior to getting in touch with building principals.
  • We tried to draw one high school, two middle/junior highs, and two elementary schools per district. In each case, we tried to sample from among schools that had the same high, medium, or low poverty and diversity profiles as did the district overall. Where we could not achieve this, we went for another building at the same level in the same district that was off the poverty/diversity profile by only one step. When that did not work either, as it did not in several small districts, we tried to sample the same building level with the same poverty/diversity profile from another of the same sized districts. Where that did not work, we tried to sample the same building level with the same poverty/diversity profile from another district that was just one step larger.

In the end, in the 45-district selected sample, we selected 219 buildings. The building level, poverty, and diversity breakdowns of this resulting selected building sample were:




90 Elementary

78 High

56 High

81 Middle

103 Medium

84 Medium

48 High School

38 Low

79 Low

The selected building sample departed from the idealized 20 per school level by poverty or diversity levels. Table A.5 shows the crosstabulation of school level by poverty level in the selected building sample.

Table A.5
Selected School Sample: Level By Poverty

School Level









Middle School




High School




This selected sample was made before getting in touch with the superintendents. Our view was that we had to be flexible in approaching superintendents with the four or five buildings we wished to survey, and of those the two we wished to visit. We acknowledged that we would follow their preferences if they wished to make changes in our lists. Of course, some superintendents did make changes. Fifty-three percent of the selected districts refused to participate and were replaced by alternates (and in many cases, those alternates were replaced by alternates). We resampled each replacement district‘s schools following the same procedures outlined above.

Once again, generating a list was easy compared with recruiting the selected buildings. We first sent principals an e-mail seeking their participation and followed up with telephone calls. In the e-mail, we told them that their superintendent had elected to participate, that their school had been selected and their participation approved by the superintendent, and outlined what participation entailed. For the site visit buildings we told principals about the $200 incentive.

The achieved building sample. As with the achieved district sample, the achieved building sample reflects the challenges recruiting schools to participate in research studies of this sort. Only 76 of the original 219 selected sample buildings (35%) agreed to participate and were in the achieved building sample. We replaced buildings that refused with others that matched the size, poverty, and diversity profiles of the original buildings to the extent possible. The achieved sample was 182 buildings. The district size, building level, poverty, and diversity breakdowns of the achieved building sample were

District Size




51 Large

43 High School

52 High

36 High

84 Medium

54 Middle School

95 Medium

85 Medium

47 Small

85 Elementary

35 Low

60 Low

Data collection


We twice surveyed the teachers, principals, and assistant principals in all the buildings in the achieved sample. We administered the first round of surveys from February 2005, to November 2006. During that period, we administered the teacher and principal surveys continuously as districts and schools were recruited. We administered the second round in spring and summer of 2008, having revised the Round One surveys for Round Two. We developed the surveys collaboratively, producing multiple iterations following numerous lengthy discussions about items and language. Both the teacher surveys and both the principal surveys contained some items from established instruments with good reliability measures as well as many new items and scales.

Round One

We field tested both Round One surveys in 14 schools in a Minnesota suburban school district in December 2004, and January 2005. The purpose of the pilot was to improve item clarity. We discussed the instruments with selected respondents after they took the surveys. After revisions and more discussions with teachers and principals, we were ready with a Round One teacher survey of 117 items and a principal survey of 149 items. The teacher survey was an eight-page optical scan booklet with glued bindings. The principal survey was an eight-page, saddle stitched paper-and-pencil booklet.

The teacher and principal surveys measured perceptions of both district leadership practices and district conditions or characteristics. In the surveys, all but one of the perception or attitudinal variables were measured using six-point scales (from "strongly disagree" to "strongly agree"). Other response categories included choices about "how many" (six steps from "none" to "all"); "how often" (six steps from "never" to "very frequently"); and "how much" (six steps from "none" to "very great"). The principal survey also had some items in which the response categories were five steps from "very little" to "very great." We divided the Round One teacher survey into sections with items about:

  • The classroom, for example
      I have a manageable number of students in my class(es)
      I am able to monitor the progress of all my students to my satisfaction
  • The school
      Disruptions of instructional time are minimized
      The school schedule provides adequate time for collaborative teacher planning
  • Teachers
      Teachers should prompt students to explain and justify their ideas to others
      (teachers and peers)
      I regularly incorporate student interests into lessons
  • Principal leadership practices
      The principal provides useful assistance to you in setting short-term goals for teaching and learning
      The principal gives you individual support to help you improve your teaching practices
  • School and home connections
      How many parents/guardians of students in your class(es) usually attend parentteacher conferences
      How many parents/guardians of students in your class(es) do you contact in the first half of the school year
  • Demographics
      How many years have you worked as a teacher
      How many years have you worked in this school as a teacher?

We divided the principal survey into sections with items about:

  • State policy and influences, for example
      State standards stimulate additional professional learning in our school
      State policies help us accomplish our school‘s learning objectives
  • District leaderships
      My district‘s leaders in the central office give schools a sense of overall purpose
      My district‘s leaders in the central office demonstrate high expectations for my work with staff and students
  • School leadership and conditions
      Most teachers in our school share a similar set of values, beliefs, and attitudes related to teaching and learning
      There is ongoing, collaborative work among teachers in our school
  • Stakeholder influence
      My school solicits input from community groups when planning curriculum
      My school includes community leaders and organizations when making important decisions
  • Professional development
      My professional development has a significant role in helping me make decisions about curriculum
      My professional development has helped me to use data more effectively
  • Demographics
      How many years have you worked as a principal
      Including you, how many principals has your current school had in the past 10 years?

School administrators—mostly principals—recruited or encouraged their teachers to fill out the survey. We made no personal appeals to the teachers to participate. We intended to survey all teachers in the achieved school sample. We defined teacher as a part-time or full-time school employee who is certified or licensed as a teacher and who carries out instructional responsibilities.

We mailed the teacher and principal surveys to 179 schools. Of the 331 principals invited to complete the survey in the 179 schools, 260 (157 principals and 103 assistant principals) returned a completed survey, for a response rate of 78.5%. We sent surveys to all teachers (6,832) in the 179 schools. Teachers returned 4,491 surveys from 43 districts and 158 schools. The response rate was 65.7%.

We mailed the surveys in bulk to individual schools to the attention of the principal. Typically teachers completed surveys during a staff meeting. A blank, sealable envelope accompanied each survey to help ensure confidentiality. In a few cases, district administrators requested that we mail surveys to the district office for distribution. Each survey packet contained:

  • A cover letter to the principal
  • A sheet of instructions for administering the surveys
  • A teacher survey for every teacher
  • A principal survey for every principal and assistant principal
  • A sealable envelope for every teacher and principal
  • A project description for every teacher and principal
  • Postage-paid, preaddressed envelopes for returning the surveys.

If we did not receive completed surveys within three to four weeks after our mailing, we telephoned and e-mailed the principal to inquire about the surveys. When a principal reported that the surveys had not arrived, we sent a second packet. We attempted to get in touch with unresponsive schools no fewer than four times. In a few cases, principals opted out of the study after receiving the surveys.

The University of Minnesota‘s Office of Measurement Services formatted and printed the teacher survey and scanned the surveys upon return. They gave back the scanned surveys and a data base. As part of data cleaning, we identified cases missing all or most of the data in the data file and examined the paper survey. In almost all cases, the data were indeed missing. Only a very few could not be scanned, because the teacher had completed the survey in red pen or with check marks. We entered those cases manually. Project staff entered the returned principal survey responses manually into an SPSS file. Staff randomly selected five percent of the principal survey returns, entered the data gain and compared it to the first entry. They detected a less than one percent error rate. Of course, they resolved the discrepancies. When we ran a similar quality control check of the Round Two principal survey data entry, we detected an eight percent error rate. Different staff members then re-entered all the data, compared the two sets and resolved all conflicts. Rechecking the new file with 10% of the cases, we found less than a 1% error rate.

Round Two

For Round Two, we collaboratively developed a revised 131-item teacher survey and a 105-item principal survey. We used identical items from the Round One surveys when we wanted repeat measures, such as in the case of a factor analysis. Items from the Round One survey were dropped for reasons of economy when an item had little variation in its response spread, so that we could add new items for deeper inquiries that had arisen from our first round of data analysis Again, the teacher survey was an eightpage optical scan booklet with glued bindings, and the principal survey was an eightpage, saddle stitched paper-and-pencil booklet.

We mailed the surveys to 177 schools with a total teacher population of 7,075. Teachers returned 3,900 surveys from 134 schools in 40 districts for a response rate of 55%. As in Round One, the teachers completed the surveys anonymously, with each survey placed by each respondent into a sealable envelope. The schools collected and returned the surveys. Three hundred fifty-one principals returned 211 surveys from 122 schools in 40 districts for a response rate of 60%.

We divided the Round Two teacher survey into sections with items about the school, teachers, classroom, school administrator(s) leadership practices, district leadership, home and school connections, and demographics. We divided the principal survey into sections with items about the principal‘s areas of expertise, school conditions, school leadership, district leadership, district policy conditions, state policy and influences, parents and community, and demographics.

Again, the teacher and principal surveys measured perceptions of both district leadership practices and district conditions or characteristics. In the surveys, all but one of the perception or attitudinal variables were measured using six-point scales (from "strongly disagree" to "strongly agree"). The one other response set used a five-point scale from "strongly disagree" to "strongly agree" with a mid-point of "uncertain." Other response categories included choices about "how many" (six steps from "none" to "all"); and "how often" (five steps from "never" to "10 times or more" or four steps from "not at all" to "every time"). The principal survey also had some items in which the response categories were four steps from "basic" to "highly developed"; and five steps from "very rarely" to "very often."

Student achievement

We were guided by five general principles in our research. Principal 4 was "Make the best use of existing student achievement data." As we wrote in our proposal to Wallace, ideally we would have wished to administer the same achievement tests to students in sampled classrooms of the 180 schools in the study, but in practice that was not possible. Because of the 2002 NCLB legislation, we assumed that all students within a state would use the same tests for literacy and mathematics. Thus, we obtained student achievement data for English and mathematics from scores on the states‘ tests for measuring Adequate Yearly Progress mandated by the No Child Left Behind Act of 2002.

We downloaded these data from the public, on-line records in each state‘s department of education website. In trying to fill in gaps in state reporting, rarely did we find the missing achievement data on district or building websites. A school‘s student achievement was represented by the percentage of students meeting or exceeding the proficiency level established by the state on mandated literacy and math tests. If states or districts tested math or literacy proficiency in more than one grade in elementary or in secondary schools, we averaged the percentages across the grades within the building level, resulting in a single achievement score for each school. We began by assembling district and building proficiency data for 2002-03, 2003-04, and 2004-05. Over the subsequent years of the study, as annual testing data became available, we added it to the student achievement data base. And over the years from 2002-03 through 2006-07, data across the states were more complete and the state department websites easier to navigate. Particularly in the first year or two of our work, the availability of data for all schools in all districts in all states was uneven.


Districts and schools

We collected three rounds of site-visit data from schools and districts. These occurred in years two, three, and five of the study. Two districts in each of the nine states had agreed to be site visit districts. Typically we visited two buildings (one elementary and one middle school or high school per district), but in two of the small districts we visited three buildings each, which were all the regular buildings in those two districts. Besides the interviews with teachers and administrators, we also conducted four or five classroom observations in each building. Thus we had site visit data from 38 schools and 18 districts. The data collection also extended to community members not employed by the districts.

We developed 10 separate, role-specific interview protocols collaboratively following numerous discussions about items and language. Even with a written script, we agreed that the interviews were to be semi-structured and more conversational than formal. With the interviewee‘s permission, we made an audio recording of the interview. We later transcribed verbatim all recorded interviews. We designed the district and school site visits interviews to take from 45 minutes to an hour each. There were four district level protocols: superintendent and district staff, school board member, business and community groups, and union leader. There were six building level protocols: principal and assistant principals, student support professionals, teacher interview (after observing his or her teaching),312 lead teacher interview, community representative, and active parents. All four district interview protocols featured the same major categories, and within each we tailored language and probes to suit the role of the interviewee. The major district interview categories were:

  • Policies and leadership
  • Relationships (for example with their state‘s department of education, school board, and other external stakeholders)
  • Political culture and collaboration
  • Capacity building (developing district leaders, school leaders, and teachers).

Compared with the district interviews, the six school-level interviews were more varied, but all had all or most of the following interview categories:

  • State influence
  • District influence/leadership
  • School leadership (distribution, development, etc.)
  • Curriculum and pedagogy
  • School culture
  • Community (interaction, culture, support, etc.)
  • Teacher leadership
  • Professional development
  • Leadership teams.

Typically, the site-visit teams were composed of four members and often included staff from both the University of Minnesota and the University of Toronto. Teams usually were made up of senior researchers, staff, and graduate students. The typical site visit required three working days in the schools and district offices.

In Round One, the number of interviews conducted in the 38 schools ranged from 4 to 13, the mean was 9, the median 9, and the mode 8. The number of interviews conducted at the district level ranged from 4 to 21, the mean was 9, the median 8, and there were multiple modes. More accurately, 10 of the 18 districts had 8 or fewer interviews. The two outliers of 18 and 21 interviews distort the mean. In total, in the first round of site visits, we collected 166 district interviews and 342 school interviews for a total of 508 interviews.

The second round of site visits was a smaller undertaking. At the school level we decided to interview just principals (and not teachers, support professionals, or assistant principals). We interviewed 28 principals in 28 buildings in 12 districts in 6 states (as well, one assistant principal was interviewed as were one lead teacher and one Title I teacher). In total, in the second round of site visits, we collected 83 district interviews and 32 school interviews for a total of 115 interviews.

The third and final round of interviews was a larger undertaking than the second round. For Round Three, we replaced three schools, one each in three different districts. The number of district offices interviews ranged from 0 to 7; in the 17 districts with district interviews the range was from 2 to 7. The mean was 3, and the median and mode were 2. In the third round of site visits, we collected 55 district interviews and 207 school interviews for a total of 262 interviews. The total number of building and district site visit interviews for the project was 885.

Coding district and building interviews

In our response to the RFP, we proposed we would produce a standardized coding scheme and code the transcribed school and district interviews, assembling them into a single, qualitative data base. Using NVivo, we coded the 508 interviews from the first round of site visits. Even though we coded all interview transcripts, each original transcript remained available as individual Word files. We wrote in our proposal that the coding system, given the scope of the study, would necessarily classify the interview data in rather broad categories because of the number of interviews and the number of coders. From our proposal to Wallace and the literature review that accompanied it grew the interview protocols, and from the interview protocols grew the major components of the coding scheme. Construction of the coding scheme was more conceptual as opposed to emergent, that is, it did not grow out of an examination or analysis of the resultant interview transcripts per se. Instead, we developed the coding framework a priori to encompass the majority of interview topics. In order to increase inter-rater reliability, we piloted the coding scheme with small, randomly selected sections of interview transcripts. When we finished coding, we compared our various codings and discussed discrepancies. Based on those conversations, the coding scheme was refined. After a long period of collaborative development, we finalized the coding scheme.

In general the coding scheme was designed to capture two things, an agent and a topic area around which that agent is acting. In major outline, the coding framework contained:


Curriculum and instruction
Professional development
Decision making and planning
Student learning outcomes
Organizational structures



State-General (Indefinite agent)
State-Professional Organizations
Federal-General (policy, initiatives)
District-General (Indefinite agent)
District-School Board
District-Professional Organizations
School-General (Indefinite agent)
School-Principal or Assistant principal



State ID (9 sub-codes)
District site ID (18 sub-codes)
District size (large, medium, low)
District poverty (high, medium, low)
District diversity (high, medium, low)
District location (urban, suburban, rural)
School site ID
School level (elementary, middle school, high school)
School poverty (high, medium, low)
School diversity (high, medium, low)
School size (student population)
Interviewee role district (superintendent, board member, staff,
parent representative, community stakeholder)
Interviewee role school (principal or assistant principal, teacher,
teacher leader, other staff, parent representative)
Interviewee gender
Interviewee role experience (0-2 years, 3-5, 6-10, 11+)
Interviewee site experience (0-2 years, 3-5, 6-10, 11+)
Site visit date (site visit 1, 2, or 3)
Document type (district, school, research memo).

With the coding scheme came a coding manual that contained the major codes, coding guidelines, definitions, and the coding format. Those researchers and staff who would undertake the coding of the 508 interviews spent considerable effort on training themselves in the intricacies of the system and the mysteries of the NVivo software.

We transcribed but did not code the interviews from the Round Two site visits. For Round Three, we again transcribed the interviews, and using NVivo, we coded them not by the original coding framework, but by the interview protocol questions themselves (this process affectionately referred to as a "data dump").

State study interviews

In our response to the Wallace RFP, we proposed to develop a "policy map" for each state based on interviews with key informants in order to develop a stable understanding of the policy dynamics that are related to efforts to change leadership for student achievement. We developed an open-ended interview protocol that was appropriate for an elite population. The main topics covered were: 1) the respondent‘s perceptions of the major state-level policy initiatives of importance over the last few years (allowing the respondent to determine the starting year/policy); 2) specific policy initiatives in two arenas: accountability and promoting school leadership; 3) a discussion of the policy initiators and actors, and their stakes and stands on major policy initiatives; and 4) their comments about the way in which groups and individuals work together or separately to exercise influence over educational policy.

We selected interview participants who would, cumulatively, yield a comprehensive set of perspectives on state-level education policy and policymaking. The interviewees included congressional representatives, commissioners of education, chairs of state boards of education, teacher and administrative union leaders, faculty members at schools of education, leaders of foundations related to education, and business leaders engaged in state education initiatives. We sent potential respondents letters of invitation and followed up with telephone calls to schedule telephone interviews.

Senior project staff interviewed from eight to 12 individuals by telephone in each state. Interviews lasted an hour or more, were recorded with the interviewee‘s permission, and later transcribed. Only one of the interviewees declined to be taped. From the nine states in the achieved sample we had 83 interviews (as well, we had 12 interviews from the two states we lost). We conducted the interviews in 2004 and 2005 with a final interview in January 2006.

Coding state study interviews

The coding scheme we developed for the state interviews was less complex than the scheme for district and school interviews. Again, we wanted a standardized coding system that would classify the interview data in rather broad categories. And again, the coding scheme closely reflected the interview protocol. In major outline, the coding framework contained:

Interview topic

Organizational school improvement
Student learning
Enhancement of professional development/
Teacher capacity and leadership
Non-specific education policy or history (general)


Context and actions

Current status
Motivations for policy
Strategies for implementation and enactment
Explanatory factors
Historical context.

There was a second round of state interviews in June, July, and August of 2008. A single staff member conducted two or three interviews per state (including in one of the states that we lost) for a total of 29 interviews. All interviewees were officials in their state‘s department of education and had not been interviewed in the first round of interviews.

Classroom Observation

Classroom observations were part of the data collection during the district site visits in rounds one and three. The task was to observe instruction in literacy (reading or language arts) and mathematics, determine the kinds and frequencies of particular instructional strategies teachers used, and note classroom conditions. The purposes of the observations were to gain an understanding of the instructional activities in the schools, which should assist us to better place the student achievement outcomes within a context; provide some corroboration for the claims made by the various district and building interviewees about the teaching and learning conditions in the school; and provide a basis for discussion during the teacher interviews that would follow the observations. We developed a structured observation protocol to collect this data.

On most site visit teams, all team members individually observed one or more teachers, as well as conducted interviews. We trained ourselves as observers to reliably document instruction in the lessons we observed based on our modification of Newmann‘s assessment of authentic instruction.313 We recorded what we saw and heard on an observation form that included two main sections: 1) basic information about the context, details of the lesson, how class time was used, how students were organized for instruction and learning, the kinds of technology used during the lesson, and a description of any positive or negative features in the classroom; and 2) assessments of instruction using four of Newmann's five standards of authentic instruction: higher order thinking, deep knowledge, substantive conversation, and connection to the world beyond the classroom. We completed the classroom observation forms during or soon after the observation period but did not show them to the teachers. Except for the observers‘ filled out observation protocol, we made no recording of any sort of the classrooms.

In the typical site visit, we observed four or five literacy or math classes per school in classrooms at all grade levels, but we preferred grades 3 or 4, 5, 8, and 10, the typical grades in which students take state-wide AYP examinations. We observed teachers during one instructional period usually lasting from 30 to 55 minutes and conducted the interview with the teacher, lasting about a half hour, as soon as possible after the lesson.

We did not sample or recruit teachers for our observations. Rather, we left the choice and persuasion of teachers to the principals or their assistants who were coordinating arrangements and scheduling for our visit to the schools. Both by e-mail and telephone, we discussed our preferences for numbers, subjects, and grades. In Round One, we returned with 145 classroom observations. For the Round Three observations, we modified our observation protocol somewhat. The major change was the addition of a one-page checklist requiring the observer to check yes or no to 24 items having to do with classroom management and use of instructional strategies. In Round Three, we returned with 167 classroom observations, and a project total of 312 classroom observations.

Appendix B
Rotated Component Matrix Data for Section 1.5



Survey Item



4-1 My school administrator develops and atmosphere of caring and trust.



4-3 My school administrator creates consensus around purposes of our district mission.



4-6 My school administrator is effective in building community support for the school's improvement efforts.



4-7 My school administrator promotes leadership development among teachers.



4-8 My school administrator models a high level of professional practice.



4-9 My school administrator ensures wide participation in decisions about school improvement.



4-10 My school administrator clearly defines standards for instructional practices.



4-24 When teachers are struggling, our principal provides support for them.



4-25 Our principal ensures that all students get high quality teachers.



4-27 In general, I believe my principal's motives and intentions are good.



4-13 How often in this school year has your school administrator discussed instructional issues with you?



4-14 How often in this school year has your school administrator encouraged collaborative work among staff?



4-15 How often in this school year has your school administrator provided or located resources to help staff improve their teaching?



4-16 How often in this school year has your school administrator observed your classroom instruction?



4-17 How often in this school year has your school administrator encouraged data use in planning for individual student needs?



4-18 How often in this school year has your school administrator attended teacher planning meetings?



4-21 How often in this school year has your school administrator given you specific ideas for how to improve your instruction?



Appendix C
Data from Section 1.6

Table C1.6.1
One-Way Analyses of Variance for Leadership Variables by Diversity


Source: 1 – 8 Teacher Survey Round One; 9 – 11 Teacher Survey Round Two
†For the planned pairwise contrasts among the means, the comparisons shown represent two means
significantly different from each other at p < .05, t-test two-tailed.

Table C1.6.2
Summary Table of Significant Main Effects for Principal Leadership Variables for Each
Context Variable for Surveyed Principals Second Round*


Source: Principal Survey Round Two.
* X indicates a significant main effect at p < .05 for that leadership variable (row) on that context variable

Table C1.6.3
One-Way Analyses of Variance for Leadership Variables by Diversity


Source: Principal Survey Round Two.
†For the planned pairwise contrasts among the means, the comparisons shown represent two means
significantly different from each other at p < .05, t-test two-tailed.

Table C1.6.4
One-Way Analyses of Variance for Leadership Variables by District Size


Source: Principal Survey Round Two
†For the planned pairwise contrasts among the means, the comparisons shown represent two means
significantly different from each other at p < .05, t-test two-tailed.

Table C1.6.5
One-Way Analyses of Variance for Leadership Variables by Urbanicity


Source: 1 – 8 Teacher Survey Round One; 9 – 11 Teacher Survey Round Two.
†For the planned pairwise contrasts among the means, the comparisons shown represent two means
significantly different from each other at p < .05, t-test two-tailed.

Table C1.6.6
One-Way Analyses of Variance for Leadership Variables by School Size


Source: 1 – 8 Teacher Survey Round One; 9 – 11 Teacher Survey Round Two.
†For these post hoc contrasts among the means, the comparisons shown represent two means significantly
different from each other at p < .05, Bonferroni t-test two-tailed.

About the Authors

Kyla Wahlstrom is Director of the Center for Applied Research and Educational Improvement (CAREI) in the College of Education and Human Development at the University of Minnesota. Her research interests include instructional reform initiatives, educational leadership, and the impact of district-wide policies on teaching and learning. She has been a teacher and principal, and is the author of numerous book chapters, journal articles, and over 50 technical reports used by educational leaders to shape policy decisions.

Karen Seashore Louis is Regents Professor and the Robert H. Beck Chair in the College of Education and Human Development at the University of Minnesota, and a past vicepresident for Division A of the American Educational Research Association. Her research focuses on schools as workplaces, school improvement and reform, and her most recent book (with Sharon D. Kruse) is Building Strong School Cultures: A Guide to Leading Change (2009). She received the Roald F. Campbell Lifetime Achievement Award from the University Council for Educational Administration in 2009.

Kenneth Leithwood is Professor of Educational Leadership and Policy at OISE/University of Toronto. His research and writing about school leadership, educational policy and organizational change is widely known and respected by educators throughout the English-speaking world. Dr. Leithwood has published more than 70 refereed journal articles and authored or edited two-dozen books.

Stephen Anderson is an Associate Professor in the Educational Administration Program, Ontario Institute for Studies in Education at the University of Toronto. His research and publication activities focus on education policy and program change, school improvement, in-service teacher development, and education leadership in Canada, the United States, East Africa, Pakistan, and Chile. His recent work focuses on the school district role in educational change and on the sustainability of school improvement.

< < Previous |


306. As two of the five districts in each state would be site visit districts as well as survey districts, we excluded Hawaii and Alaska because of travel costs. We also excluded Washington DC because of its atypical governance circumstances.

307. No alternate state was available in the West as no other state had both SAELP and LEAD funding.

308. Five states would be selected from the non-SAELP funded states – one state each from three quadrants and two states from one of the quadrants.

309. Depending on the state, we wrote to the Superintendent of Public Instruction or the Commissioner of Education or Secretary of Education or Chancellor of the State Board and so on.

310. If two or more districts satisfied the demographic characteristics under consideration, we randomly selected districts with the SAMPLE command in SPSS; if there was only one district that satisfied the desired demographic conditions, we took it.

311. If two or more buildings satisfied the demographic characteristics under consideration, we randomly selected the desired number of buildings – for example, two elementary buildings per district – with the SAMPLE command in SPSS; if there was only one building that satisfied the desired demographic conditions, we took it.

312. The interview protocol for observed teachers was a bit more narrowly focused than many of the others. With observed teachers, the focus was on specific activities during the lessons; general approaches to pedagogy; the role of the principal as well as other leaders within the school, district, and state on pedagogy; curricular and pedagogical decision making in the school; professional development; and student learning.

313. Newmann, F. M., Secada, W. G. & Wehlage, G. G. (1995). A guide to authentic instruction and assessment: Vision, standards, and scoring. Madison, WI: Wisconsin Center for Education Research, pp. 86-93.