Teachers Are Put to the Test
More States Tie Tenure, Bonuses to New Formulas for Measuring Test Scores
Stephanie Banchero and David Kesmodel | Wall Street Journal
MADISON, Wis.—Teacher evaluations for years were based on brief classroom observations by the principal. But now, prodded by President Barack Obama's $4.35 billion Race to the Top program, at least 26 states have agreed to judge teachers based, in part, on results from their students' performance on standardized tests.
So with millions of teachers back in the classroom, many are finding their careers increasingly hinge on obscure formulas like the one that fills a whiteboard in an economist's office here.
The metric created by Value-Added Research Center, a nonprofit housed at the University of Wisconsin's education department, is a new kind of report card that attempts to gauge how much of students' growth on tests is attributable to the teacher.
For the first time this year, teachers in Rhode Island and Florida will see their evaluations linked to the complex metric. Louisiana and New Jersey will pilot the formulas this year and roll them out next school year. At least a dozen other states and school districts will spend the year finalizing their teacher-rating formulas.
"We have to deliver quality and speed, because [schools] need the data now," said Rob Meyer, the bowtie-wearing economist who runs the Value-Added Research Center, known as VARC, and calls his statistical model a "well-crafted recipe."
VARC is one of at least eight entities developing such models.
Supporters say the new measuring sticks could improve U.S. educational performance by holding teachers accountable for students' progress. Teachers unions and other critics say the tests' measurements are narrow and that the teachers' scores jump around too much, casting doubt on the validity of the formulas.
Janice Poda, strategic-initiatives director for the Council of Chief State School Officers, said education officials are trying to make sense of the complicated models. "States have to trust the vendor is designing a system that is fair and, right now, a lot of the state officials simply don't have the information they need," she said.
Bill Sanders, who developed the nation's first model to measure teachers' effect on student test scores, advises caution. "People smell the money and there are lots of people rushing out with unsophisticated formulas," said Mr. Sanders, who works as a senior researcher at software firm SAS Institute Inc., which competes with VARC for contracts.
In general, the models use a student's score on, say, a fourth-grade math test to predict how she or he would perform on the fifth-grade test. Some groups, such as VARC, adjust those raw test scores to control for students' outside factors, such as income or race. The actual fifth-grade score is then compared with the expected score, which then translates into the measure of the teacher's added value.
The teacher's overall effectiveness with every student in the classroom is boiled down to one number to rate them from least effective to most effective.
For states and school districts, deciding which vendor to use is critical. The metrics differ in substantial ways and those distinctions can have a significant influence on whether a teacher is rated superior or subpar.
In August, a New York State Supreme Court judge invalidated a vote by state education officials that would have let districts base 40% of teacher evaluations on state test scores, after the state teachers unions sued saying the law allowed for only 20%. The Los Angeles teachers union has sued to stop the district from launching a pilot program that would grade some teachers using a VARC formula.
Until this year, only a few districts used value-added data. Washington, D.C., used it to fire about 60 teachers; New York City employed it to deny tenure to what it considered underperforming teachers; and Houston relied on it to award bonuses.
Michelle Rhee, who instituted a tough evaluation system when she was schools chancellor in Washington, said she took over a district where many students failed achievement exams, yet virtually every teacher was rated effective.
"While it's not a perfect measure, it was a much fairer, more transparent and consistent way to evaluate teachers," said Ms. Rhee, who now heads StudentsFirst, a nonprofit advocate for education overhauls.
Andy Dewey, an 11th-grade history teacher in Houston, is not a fan. He saw his score bounce from a positive rating in the 2008-09 school year to a negative rating the following year, decreasing his bonus by about $2,300.
"It's a bunch of garbage," said Mr. Dewey, who is executive vice president of a local teachers union. "These tests are designed to measure students, and they are being used to measure teachers. It's absolutely a misuse of the information."
In New York City, value-added data has been used for the last two years by principals only to make teacher tenure decisions. Last year, 3% of teachers did not receive tenure protection based, in part, on that data. A new state law, passed in an effort to compete for Race the Top, requires the data become an official part of every teacher evaluation.
At Frederick Douglass Academy in Harlem, principal Gregory Hodge uses the value-added results to alter instruction, move teachers to new classroom assignments and pair weak students with the highest performing teachers. Mr. Hodge said the data for teachers generally aligns with his classroom observations. "It's confirming what an experienced principal knows," he said.