Stopping the Overuse of Standardized Tests
By Monty Neill
Vol. 11, No. 4, 1989, pp. 9-10, 12-14
During the past decade, a tidal wave of standardized testing has washed over the school systems of America. In its report, Fallout From the Testing Explosion, FairTest (the National Center for Fair and Open Testing) calculated that at least 100 million tests are given in the public schools each year. And, FairTest found, testing is more frequent in the Southern states than in the rest of the country.
In the past several years, however, a movement to turn back the wave of testing has developed. In 1988, North Carolina passed a law banning the use of standardized achievement tests in the first and second grades. Mississippi will discontinue use of its kindergarten test after the 1988-89 school year. Georgia will do the same, even though it only enacted the testing requirement last year. And in June, Texas approved legislation to eliminate its first grade test.
These are contradictory trends: simultaneously a continuing push toward more testing and an effort to reduce the massive overuse, misuse and abuse of standardized testing. To understand why the testing explosion has generated a growing movement for testing reform, we need to answer a number of key questions:
What are the effects of tests on students?
How does testing hurt school curricula and instruction?
Why does testing negatively impact on school accountability and reform?
What are the limitations of standardized tests?
Why are standardized exams biased?
After responding to each of these questions, we will discuss what is being done in the South to counter the dangers of too much testing.
Harming Individual Students
In the U.S., children from lower-income, rural, inner-city and minority backgrounds often receive an inadequate education. To some extent, the lower average test scores of students from these backgrounds reflects this fact. But often at a very young age, children are tested and placed in programs that virtually guarantee they will never receive an adequate education.
Based on standardized test scores, thousands of children are placed in programs for the “educable, mentally retarded” or similar special education programs. Students in such programs rarely rise out of them. Instead, they usually fall further and further behind more advanced students.
Minority students are far more likely to be placed in these programs than are majority-group students. For example, across the nation, blacks are three times as likely as whites to be in “educable mentally retarded” or similar programs. After hearing evidence about the construction of the exams, a California federal judge concluded that “IQ” tests had never been validated for use on black children. Basing himself on information about placement and construction, the judge banned the use of “IQ” tests in assigning black children to special education programs anywhere in the state.
The same tests are also a common method of determining eligibility for “gifted and talented” programs. Blacks are only half as likely as whites to be in these programs, which often provide a more enriched education. Despite the racial disparities and the California decision, test uses such as these remain all too common elsewhere in the U.S.
Students also are put in or removed from Chapter I, bilingual and other remedial programs on the basis of their test scores. In many districts, scores are used to keep children out of kindergarten or first grade or to place them in “transitional” programs.
Testing is also an important factor in decisions to retain students in grade. Not only are the tests of questionable merit in such decisions, but retention itself is of dubious educational value. Current evidence indicates that at the end of the third grade, children who were retained before grade three do not perform better than others who scored the same but were not held back, even though the children who were held back are a year older at this point.
Students who have been retained are also more likely to drop out of school. So using tests to retain children often does not help them but only increases the likelihood that they will drop out or otherwise fail in school.
Tests are also misused for placing students in tracks within schools. Tracking is often justified on the grounds that it protects slower children from being overwhelmed and helps advanced children by keeping them moving at a faster, more interesting pace. In fact, research shows that tracking does not help advanced students–they do just as well in a mixed-ability group–but does hurt lower-ranked students.
Page 10
In many districts, tracking based on test scores begins at a very early age. At every grade, low-income and minority-group students are more likely to be in the slower tracks. As a result, they are put at greater educational risk by the testing and placement process.
However, many students who do not score well on tests can in fact do regular academic work. This was demonstrated recently when the National Collegiate Athletic Association (NCAA) adopted Proposition 42, which barred awarding athletic scholarships to students who did not score 700 on the SAT or 15 on the ACT. That is, the NCAA concluded that a student who scores under 700 or 15 could not do college level work. But a recent University of Michigan study found that 86 percent of the athletes who would have been denied entrance because of low test scores actually succeeded in their freshman course work. A large percentage of them went on to graduate. Evidence such as this has forced the NCAA to reconsider the wisdom of Proposition 42. It confirms, yet again, the fact that tests are extremely fallible devices for predicting how well an individual will perform.
Thus testing acts as a major barrier to gaining a decent education and reduces the life-chances of many children. This barrier affects students from low-income and minority backgrounds most strongly. Facts such as these have helped create an emerging testing reform movement.
Damage to the Curriculum
The damage caused by testing does not stop at barring minority and low-income children from access to a quality education. Tests have also come to control the curriculum in many schools, with often-disastrous results.
Several major reports issued during the past year all concluded that U.S. students are not developing “higher order thinking skills.” Research also has shown that the methods commonly used to raise standardized test scores–drill, memorization, rote learning and repetition–are counterproductive to teaching higher order skills. In preparing students to score high on the tests, teachers divert educational time and energy from the “higher order” curriculum, as well as from any nonacademic efforts.
As tests have come to drive the schools, the curriculum has been “dumbed-down.” For example, basal readers often contain material of little interest to students, written in a totally dry manner using none of the language of real life or good literature. Children who score high on the tests at the end of each basal lesson are given the opportunity to read other things. But children who do not do well are given more of what did not work the first time. This simplistic and repetitive curriculum bores them and turns them off to schooling.
The point is not whether children need basic skills or whether there is a role for memorization or repetition. They do, and there is. But these methods are not the essence of education and learning.
Unfortunately, testing’s harmful effects on curriculum and instruction fall most heavily on those who have already been victimized by standardized tests. For too many of today’s students, especially those from low-income and minority families, schooling has been reduced to test-coaching.
A Spurious Accountability
Despite these flaws, testing is often defended on the grounds that it improves the ability to assess the performance of students, teachers, schools and districts, and thereby improves accountability. However, instruments as full of problems as standardized tests can never be adequate measures of educational quality. Reducing accountability to test scores provides the illusion of quality without its substance.
As testing spreads and increasingly defines the content of the curriculum, decision-making power over our schools is removed from parents, teachers, and local government. Control either shifts to the testing office of the state education department or is put in the hands of the testing industry.
Unlike food, drugs or transportation, the billion dollar a year testing industry operates with little public oversight or control. Moreover, as the late Oscar Buros, founder of the authoritative Mental Measurement Yearbook, lamented, “It is practically impossible for a competent test technician or test consumer to make a thorough appraisal of the construction, validation and use of standardized tests…because of the limited amount of trustworthy information supplied by the the [sic] publishers.”
Advocates of standardized testing expect parents and community to leave important decisions about the lives of their children in the hands of this unregulated, unaccountable industry.
Limitations of Tests
Reliance on standardized tests will ensure that quality education remains unavailable to many students. Moreover, the problems of testing cannot be fixed simply by changing some of the questions or other minor tinkering. The basic structure of the standardized test means that it is an extremely limited tool for measuring learning.
The typical multiple choice format prohibits measuring more than a very narrow range of student performance. In the real world, people are not given a problem designed to confuse and mislead and asked to pick the one correct answer out of four or five options.
Real world problems may have more than one correct answer. Though many questions on standardized test also have more than one correct answer, the format allows for only one to be marked as “correct.”
Page 12
For example, consider this question, presented by Hoover, Politzer and Taylor (in Negro Educational Review, April-July 1987, p.91 ): “Father said: Once there was a land where boys and girls never grew up. They were always growing. What was Father telling? the truth, a lie, a story.”
As the authors explain, “The ‘right’ answer could be any of them. Metaphorically, it could be the ‘truth’ if the growth were mental and not physical. It could be a ‘lie’ in that the word ‘lie’ in black speech can also mean a joke or a story, and it could also be a ‘story.'” This question demonstrates how tests can penalize creative thinking or cultural diversity.
It also indicates that understanding how the test maker thinks is a key to doing well on these exams.
In general, standardized tests cannot measure knowledge in any complex or in-depth fashion. They often trivialize subjects in order to fit the multiple-choice, one correct answer format.
They emphasize simple, memorizable definitions and formulas. And they certainly do not measure critical thinking, problem-solving skills, use of knowledge in the field, or creativity.
Moreover, test reliability and validity is often inadequate for the purposes to which tests are put (see related article [Al Clayton. Issues of Reliability and Validity. Vol. 11, No. 4]).
Test Bias
Misuse of standardized tests is one major reasons why children from low-income and minority backgrounds are so often mix-placed on the basis of test scores. Bias in the tests themselves is another important reason.
Test proponents claim that standardized exams are “objective.” However, the only thing about them that is objective is the mechanical method by which they are scored. The decisions about what academic areas to cover, what questions to put in the test, what language style to use, how difficult to make the test, and how scores are interpreted and used–all these are subjective, not objective, decisions. All can be biased.
As with any human activity or product, every test is grounded in a particular culture. Contemporary U.S. society contains many cultures organized around factors such as language, regional or national origin, race and ethnicity, gender and class.
Page 13
Basing a test on one culture can lead to its being biased against people from other backgrounds.
Consider the following item from the WISC-R (Wechsler Intelligence Scales for Children-Revised), the most widely used “IQ” test: “What is the thing to do when you cut your finger?” Two-point response: “Put a Band-Aid on it…” One point response: “…Go to the doctor (hospital)…Get it stitched up…” Zero-point response: “…Suck blood…Don’t panic…Let it bleed.”
A Maryland sociologist discovered that minority children usually score low on this item. She asked youths in inner-city Baltimore why they answered the question the way they did. She found that many answered “go to the hospital” because they thought that “cut” meant a big cut. When told it was a small cut, almost every child said to use a Band-Aid.
Even getting a few questions like this “wrong” can dramatically lower a child’s “intelligence” score. However, on the WISC-R as on many other tests, the problem is not one or two questions but that the cultural background built into the exams simply does not match the experiences of many children.
Item selection and norming are the processes that enable individual scores to be distributed on a “normal” bell-shaped curve. In order to construct the curve, item selection and norming must be based primarily on the responses of subjects from the majority culture. As a result, all major tests in the U.S. are constructed to fit the culture of white middle to upper classes. Thus, they are biased against those who are not from that culture.
Standardizing a test means standardizing bias. Assuming that a given test has some validity as a measure for the majority population, bias means that the test will not adequately measure the true abilities of people from minority groups. That means the tests will not be valid for use on minority populations. It was for reasons such as this that the California court found that “IQ” tests were biased against black children.
Test-makers do not recognize that they might be measuring class or culture rather than ability or achievement, and do not grasp the fundamental nature of bias in testing. Thus, the procedures test-makers use to remove bias treat the problem as occurring only accidentally, in an occasional item, not as underlying the instrument as a whole. As a result, test-makers’ “fairness standards” fail to eradicate test bias. Deeply rooted bias is one additional
Page 14
reason why standardized tests are inadequate as primary tools in educational evaluation.
What Can Be Done?
Test misuse and overuse threatens the educational health of our nation. But what can be done about these problems?
To develop a strong educational system for all, the testing tidal wave must be fumed back. In its place, more useful, appropriate and unbiased assessments must be developed and implemented. Parents, educators and concerned citizens need to make certain that standardized, multiple-choice tests are not used to harm students or dictate the shape of education.
FairTest believes that education can be improved if all testing programs are evaluated on the following principles:
- Tests must be relevant. They should only be used where they can be shown to be directly helpful to educators and students. The quest to score high on standardized exams must not be allowed to drive schooling.
- Tests must be open. Parents, teachers and independent evaluators should have a right to know how tests are constructed, validated and used.
- Tests must be fair and unbiased. No student should be assessed based on culturally specific instruments.
FairTest is working closely with groups around the nation to implement this agenda. In mid-March, FairTest convened, in Atlanta, a Southern Regional Conference on Testing Reform in the Public Schools. Civil rights, educational reform and children’s rights advocates from many Southern states attended. This conference launched the Southern Network for Testing Reform.
Thus far, its major emphasis has been on repealing the use of tests on young children. For example, educators from Alabama plan to win a moratorium on all mass standardized testing through grade three. In this, they are largely following the example of North Carolina, where the Atlantic Center for Research in Education (ACRE) initiated a campaign to ban testing in grades one and two after the state mandated use of the California Achievement Test (CAT) in those grades [see sidebar]. ACRE opposed the tests because they have low reliability and validity for young children, the scores are not useful to teachers, too much time was spent on testing, and the tests began to drive the curriculum.
ACRE’s testing reform initiative was joined by the 1700-member North Carolina Association for the Education of Young Children. The two organizations developed a strategy of public education, watch-dogging the state’s Testing Commission, and lobbying the legislature. In time, they were joined by teacher organizations and school psychologists.
In 1987, the legislature ended funding for testing in grades one and two. The following year, the state legislated an outright ban on the use of standardized achievement tests in those grades and mandated that alternative assessments be devised. These alternative evaluation tools were developed out of an analysis of the state’s curriculum and are based on child development theory. They will be introduced across the state in the fall of 1989.
The movement to stop testing young children is only the beginning. While testing young children often causes the most harm, test misuse does great damage all the way through high school graduation–or non-graduation. Activists in each state and district must consider how best to combat all types of test misuse and abuse.
In summary, standardized tests are being used to deprive many children of a quality education. Though the negative effect falls most heavily on young children and students from low-income and minority backgrounds, test overuse blocks the ability to improve education for all students.
While many of these problems are due to misuse of the instruments, the instruments themselves are often flawed. Therefore, standardized tests should be used with great caution, and even then only with additional information about students or programs.
Readers who are interested in working with others in their state on testing reform should contact FairTest. We will put you in touch with local activists, send you information, and help you initiate and develop coalitions and campaigns to end the overuse, misuse and abuse of standardized testing.
Monty Neill is the Associate Director of FairTest, the National Center for Fair Open Testing. Documentation, as well as a more extensive discussion of most of the points in this article, can be found in Fallout Prom the Testing Explosion: How 100 Million Standardized Exams Undermine Equality and Excellence in America’s Public Schools, by Noe Medina and D. Monty Neill, available for $8.95 from FairTest, 342 Broadway, Cambridge, MA 02139; (617) 864-4810.