
          Stopping the Overuse of Standardized Tests
          By Neill, MontyMonty Neill
          Vol. 11, No. 4, 1989, pp. 9-10, 12-14
          
          During the past decade, a tidal wave of standardized testing has
washed over the school systems of America. In its report,
Fallout From the Testing Explosion, FairTest (the
National Center for Fair and Open Testing) calculated that at least
100 million tests are given in the public schools each year. And,
FairTest found, testing is more frequent in the Southern states than
in the rest of the country.
          In the past several years, however, a movement to turn back the
wave of testing has developed. In 1988, North Carolina passed a law
banning the use of standardized achievement tests in the first and
second grades. Mississippi will discontinue use of its kindergarten
test after the 1988-89 school year. Georgia will do the same, even
though it only enacted the testing requirement last year. And in June,
Texas approved legislation to eliminate its first grade test.
          These are contradictory trends: simultaneously a continuing push
toward more testing and an effort to reduce the massive overuse,
misuse and abuse of standardized testing. To understand why the
testing explosion has generated a growing movement for testing reform,
we need to answer a number of key questions:
          What are the effects of tests on students?
          How does testing hurt school curricula and instruction?
          Why does testing negatively impact on school accountability and
reform?
          What are the limitations of standardized tests?
          Why are standardized exams biased?
          After responding to each of these questions, we will discuss what
is being done in the South to counter the dangers of too much
testing.
          
            Harming Individual Students
          
          In the U.S., children from lower-income, rural, inner-city and
minority backgrounds often receive an inadequate education. To some
extent, the lower average test scores of students from these
backgrounds reflects this fact. But often at a very young age,
children are tested and placed in programs that virtually guarantee
they will never receive an adequate education.
          Based on standardized test scores, thousands of children are placed
in programs for the "educable, mentally retarded" or similar special
education programs. Students in such programs rarely rise out of
them. Instead, they usually fall further and further behind more
advanced students.
          Minority students are far more likely to be placed in these
programs than are majority-group students. For example, across the
nation, blacks are three times as likely as whites to be in "educable
mentally retarded" or similar programs. After hearing evidence about
the construction of the exams, a California federal judge concluded
that "IQ" tests had never been validated for use on black
children. Basing himself on information about placement and
construction, the judge banned the use of "IQ" tests in assigning
black children to special education programs anywhere in the state.
          The same tests are also a common method of determining eligibility
for "gifted and talented" programs. Blacks are only half as likely as
whites to be in these programs, which often provide a more enriched
education. Despite the racial disparities and the California decision,
test uses such as these remain all too common elsewhere in the U.S.
          Students also are put in or removed from Chapter I, bilingual and
other remedial programs on the basis of their test scores. In many
districts, scores are used to keep children out of kindergarten or
first grade or to place them in "transitional" programs.
          Testing is also an important factor in decisions to retain students
in grade. Not only are the tests of questionable merit in such
decisions, but retention itself is of dubious educational
value. Current evidence indicates that at the end of the third grade,
children who were retained before grade three do not perform better
than others who scored the same but were not held back, even though
the children who were held back are a year older at this point.
          Students who have been retained are also more likely to drop out of
school. So using tests to retain children often does not help them but
only increases the likelihood that they will drop out or otherwise
fail in school.
          Tests are also misused for placing students in tracks within
schools. Tracking is often justified on the grounds that it protects
slower children from being overwhelmed and helps advanced children by
keeping them moving at a faster, more interesting pace. In fact,
research shows that tracking does not help advanced students--they do
just as well in a mixed-ability group--but does hurt lower-ranked
students.

          In many districts, tracking based on test scores begins at a very
early age. At every grade, low-income and minority-group students are
more likely to be in the slower tracks. As a result, they are put at
greater educational risk by the testing and placement process.
          However, many students who do not score well on tests can in fact
do regular academic work. This was demonstrated recently when the
National Collegiate Athletic Association (NCAA) adopted Proposition
42, which barred awarding athletic scholarships to students who did
not score 700 on the SAT or 15 on the ACT. That is, the NCAA concluded
that a student who scores under 700 or 15 could not do college level
work. But a recent University of Michigan study found that 86 percent
of the athletes who would have been denied entrance because of low
test scores actually succeeded in their freshman course work. A large
percentage of them went on to graduate. Evidence such as this has
forced the NCAA to reconsider the wisdom of Proposition 42. It
confirms, yet again, the fact that tests are extremely fallible
devices for predicting how well an individual will perform.
          Thus testing acts as a major barrier to gaining a decent education
and reduces the life-chances of many children. This barrier affects
students from low-income and minority backgrounds most strongly. Facts
such as these have helped create an emerging testing reform
movement.
          
            Damage to the Curriculum
          
          The damage caused by testing does not stop at barring minority and
low-income children from access to a quality education. Tests have
also come to control the curriculum in many schools, with
often-disastrous results.
          Several major reports issued during the past year all concluded
that U.S. students are not developing "higher order thinking skills."
Research also has shown that the methods commonly used to raise
standardized test scores--drill, memorization, rote learning and
repetition--are counterproductive to teaching higher order skills. In
preparing students to score high on the tests, teachers divert
educational time and energy from the "higher order" curriculum, as
well as from any nonacademic efforts.
          As tests have come to drive the schools, the curriculum has been
"dumbed-down." For example, basal readers often contain material of
little interest to students, written in a totally dry manner using
none of the language of real life or good literature. Children who
score high on the tests at the end of each basal lesson are given the
opportunity to read other things. But children who do not do well are
given more of what did not work the first time. This simplistic and
repetitive curriculum bores them and turns them off to schooling.
          The point is not whether children need basic skills or whether
there is a role for memorization or repetition.  They do, and there
is. But these methods are not the essence of education and
learning.
          Unfortunately, testing's harmful effects on curriculum and
instruction fall most heavily on those who have already been
victimized by standardized tests. For too many of today's students,
especially those from low-income and minority families, schooling has
been reduced to test-coaching.
          
            A Spurious Accountability
          
          Despite these flaws, testing is often defended on the grounds that
it improves the ability to assess the performance of students,
teachers, schools and districts, and thereby improves
accountability. However, instruments as full of problems as
standardized tests can never be adequate measures of educational
quality. Reducing accountability to test scores provides the illusion
of quality without its substance.
          As testing spreads and increasingly defines the content of the
curriculum, decision-making power over our schools is removed from
parents, teachers, and local government. Control either shifts to the
testing office of the state education department or is put in the
hands of the testing industry.
          Unlike food, drugs or transportation, the billion dollar a year
testing industry operates with little public oversight or
control. Moreover, as the late Oscar Buros, founder of the
authoritative Mental Measurement Yearbook, lamented,
"It is practically impossible for a competent test technician or
test consumer to make a thorough appraisal of the construction,
validation and use of standardized tests...because of the limited
amount of trustworthy information supplied by the the
publishers."
          Advocates of standardized testing expect parents and community to
leave important decisions about the lives of their children in the
hands of this unregulated, unaccountable industry.
          
            Limitations of Tests
          
          Reliance on standardized tests will ensure that quality education
remains unavailable to many students. Moreover, the problems of
testing cannot be fixed simply by changing some of the questions or
other minor tinkering. The basic structure of the standardized test
means that it is an extremely limited tool for measuring learning.
          The typical multiple choice format prohibits measuring more than a
very narrow range of student performance. In the real world, people
are not given a problem designed to confuse and mislead and asked to
pick the one correct answer out of four or five options.
          Real world problems may have more than one correct answer. Though
many questions on standardized test also have more than one correct
answer, the format allows for only one to be marked as "correct."

          For example, consider this question, presented by Hoover, Politzer
and Taylor (in Negro Educational Review, April-July
1987, p.91 ): "Father said: Once there was a land where boys and
girls never grew up. They were always growing. What was Father
telling? the truth, a lie, a story."
          As the authors explain, "The 'right' answer could be any of
them. Metaphorically, it could be the 'truth' if the growth were
mental and not physical. It could be a 'lie' in that the word 'lie' in
black speech can also mean a joke or a story, and it could also be a
'story.'" This question demonstrates how tests can penalize creative
thinking or cultural diversity.
          It also indicates that understanding how the test maker thinks is a
key to doing well on these exams.
          In general, standardized tests cannot measure knowledge in any
complex or in-depth fashion. They often trivialize subjects in order
to fit the multiple-choice, one correct answer format.
          They emphasize simple, memorizable definitions and formulas. And
they certainly do not measure critical thinking, problem-solving
skills, use of knowledge in the field, or creativity.
          Moreover, test reliability and validity is often inadequate for the
purposes to which tests are put (see related article [Al Clayton.  Issues of Reliability and Validity.
Vol. 11, No. 4]).
          
            Test Bias
          
          Misuse of standardized tests is one major reasons why children from
low-income and minority backgrounds are so often mix-placed on the
basis of test scores. Bias in the tests themselves is another
important reason.
          Test proponents claim that standardized exams are "objective."
However, the only thing about them that is objective is the mechanical
method by which they are scored. The decisions about what academic
areas to cover, what questions to put in the test, what language style
to use, how difficult to make the test, and how scores are interpreted
and used--all these are subjective, not objective, decisions. All can
be biased.
          As with any human activity or product, every test is grounded in a
particular culture. Contemporary U.S. society contains many cultures
organized around factors such as language, regional or national
origin, race and ethnicity, gender and class.

          Basing a test on one culture can lead to its being biased against
people from other backgrounds.
          Consider the following item from the WISC-R (Wechsler Intelligence
Scales for Children-Revised), the most widely used "IQ" test: "What is
the thing to do when you cut your finger?" Two-point response: "Put a
Band-Aid on it..." One point response: "...Go to the doctor
(hospital)...Get it stitched up..." Zero-point response: "...Suck
blood...Don't panic...Let it bleed."
          A Maryland sociologist discovered that minority children usually
score low on this item. She asked youths in inner-city Baltimore why
they answered the question the way they did. She found that many
answered "go to the hospital" because they thought that "cut" meant a
big cut. When told it was a small cut, almost every child said to use
a Band-Aid.
          Even getting a few questions like this "wrong" can dramatically
lower a child's "intelligence" score. However, on the WISC-R as on
many other tests, the problem is not one or two questions but that the
cultural background built into the exams simply does not match the
experiences of many children.
          Item selection and norming are the processes that enable individual
scores to be distributed on a "normal" bell-shaped curve. In order to
construct the curve, item selection and norming must be based
primarily on the responses of subjects from the majority culture. As a
result, all major tests in the U.S. are constructed to fit the culture
of white middle to upper classes. Thus, they are biased against those
who are not from that culture.
          Standardizing a test means standardizing bias. Assuming that a
given test has some validity as a measure for the majority population,
bias means that the test will not adequately measure the true
abilities of people from minority groups. That means the tests will
not be valid for use on minority populations. It was for reasons such
as this that the California court found that "IQ" tests were biased
against black children.
          Test-makers do not recognize that they might be measuring class or
culture rather than ability or achievement, and do not grasp the
fundamental nature of bias in testing. Thus, the procedures
test-makers use to remove bias treat the problem as occurring only
accidentally, in an occasional item, not as underlying the instrument
as a whole. As a result, test-makers' "fairness standards" fail to
eradicate test bias. Deeply rooted bias is one additional 

reason why
standardized tests are inadequate as primary tools in educational
evaluation.
          
            What Can Be Done?
          
          Test misuse and overuse threatens the educational health of our
nation. But what can be done about these problems?
          To develop a strong educational system for all, the testing tidal
wave must be fumed back. In its place, more useful, appropriate and
unbiased assessments must be developed and implemented. Parents,
educators and concerned citizens need to make certain that
standardized, multiple-choice tests are not used to harm students or
dictate the shape of education.
          FairTest believes that education can be improved if all testing
programs are evaluated on the following principles:
Tests must be relevant. They should only be used where they
can be shown to be directly helpful to educators and students. The
quest to score high on standardized exams must not be allowed to drive
schooling.Tests must be open. Parents, teachers and independent
evaluators should have a right to know how tests are constructed,
validated and used.Tests must be fair and unbiased. No student should be assessed
based on culturally specific instruments.
          FairTest is working closely with groups around the nation to
implement this agenda. In mid-March, FairTest convened, in Atlanta, a
Southern Regional Conference on Testing Reform in the Public
Schools. Civil rights, educational reform and children's rights
advocates from many Southern states attended. This conference launched
the Southern Network for Testing Reform.
          Thus far, its major emphasis has been on repealing the use of tests
on young children. For example, educators from Alabama plan to win a
moratorium on all mass standardized testing through grade three. In
this, they are largely following the example of North Carolina, where
the Atlantic Center for Research in Education (ACRE) initiated a
campaign to ban testing in grades one and two after the state mandated
use of the California Achievement Test (CAT) in those grades [see
sidebar]. ACRE opposed the tests because they have low reliability and
validity for young children, the scores are not useful to teachers,
too much time was spent on testing, and the tests began to drive the
curriculum.
          ACRE's testing reform initiative was joined by the 1700-member
North Carolina Association for the Education of Young Children. The
two organizations developed a strategy of public education,
watch-dogging the state's Testing Commission, and lobbying the
legislature. In time, they were joined by teacher organizations and
school psychologists.
          In 1987, the legislature ended funding for testing in grades one
and two. The following year, the state legislated an outright ban on
the use of standardized achievement tests in those grades and mandated
that alternative assessments be devised. These alternative evaluation
tools were developed out of an analysis of the state's curriculum and
are based on child development theory. They will be introduced across
the state in the fall of 1989.
          The movement to stop testing young children is only the
beginning. While testing young children often causes the most harm,
test misuse does great damage all the way through high school
graduation--or non-graduation. Activists in each state and district
must consider how best to combat all types of test misuse and
abuse.
          In summary, standardized tests are being used to deprive many
children of a quality education. Though the negative effect falls most
heavily on young children and students from low-income and minority
backgrounds, test overuse blocks the ability to improve education for
all students.
          While many of these problems are due to misuse of the instruments,
the instruments themselves are often flawed. Therefore, standardized
tests should be used with great caution, and even then only with
additional information about students or programs.
          Readers who are interested in working with others in their state on
testing reform should contact FairTest. We will put you in touch with
local activists, send you information, and help you initiate and
develop coalitions and campaigns to end the overuse, misuse and abuse
of standardized testing.
          
            Monty Neill is the Associate Director of FairTest, the
National Center for Fair &Open Testing. Documentation, as well as a
more extensive discussion of most of the points in this article, can
be found in Fallout Prom the Testing Explosion: How 100 Million
Standardized Exams Undermine Equality and Excellence in America's
Public Schools, by Noe Medina and D. Monty Neill, available
for $8.95 from FairTest, 342 Broadway, Cambridge, MA 02139; (617)
864-4810.
          
        