Assessment & evaluation

Assessment: introduction for learning and teaching

Purpose of assessment is not to rank and sort students, but to identify what learners know and are able to do. It should be an integral part of an instructive process that builds on what a learner understands in a manner that motivates their engagement, fosters self-efficacy, and teaches them how to learn in general.

R Sweetland

Overview

Introduction
Why assess?
What is assessment?
What to assess?
- Evaluation
Assessment task creation
- Selecting and unpacking ideas for assessment
Quality Assessment Standards & check list
Determining outcomes
What is authentic assessment?
When do teachers assess students' attainment of outcomes?
- Four different assessment times:

diagnostic,
formative,
summative, and
generative

Procedure for Creating a Scoring Guide or Rubric
Standardized tests
- Norm-Referenced tests
Criteria referenced assessment
- Steps for Creating Criteria Referenced Assessment
Accommodation information
Relationship of assessing growth & achievement
Validity & reliability
- Introduction
- Suggestions and procedures to evaluate assessment claims
- Procedures to follow to create & use assessments that are validate & reliable
- Assessment evaluation examples
  - High stakes testing for teacher retention
  - Portfolios for preservice retention & certification
Assorted assessment & evaluation links
Practice assessment scenarios and role play for preprofessional & professional educators:
Assessment glossary

Introduction

This article reviews assessment, why assess, what to assess, assessment tasks, suggestions on how to select and unpack ideas for assessment, quality standards, determining outcomes, authentic assessment, different ways to assess and when.

Other topics include: evaluation, different assessment times, scoring guides, rubrics, procedures for creating them, criteria referenced, norm-referenced, relationship of growth and achievement, and practice scenarios for professional growth.

Focus questions for assessment:

Why assess?
What is assessment?
What to assess?
How do you create assessment tasks?
How are outcomes determined?
When do teachers assess students' attainment of outcomes?
How is a scoring guide or rubric created?
What are the main kinds of standardized tests?
What kinds of accomodations are there?
What is growth and acheivement?

Why assess?

Assessment is conducted continuously by both professional educators and learners with the purpose to improve learning and instruction.

What is assessment?

Assessment is the process of collecting information (data). It is measurement activities educators use to attempt to make valid inferences to determine or estimate what learners know, can do, and how much they have learned (knowledge, skills, and disposions); as well as using those measurements and inferences to decide curricular decisions (goals, objectives, and outcomes), if instructional strategies are developmentally and academically appropriate, and if an instructional and curricular sequences are successful. It can include tests, student demonstrations, teacher observations, professional judgments, graduation rates, surveys, and other data. It may or may not be used for evaluation purposes.

Assessment is such a broad category that it is sometimes combined with other identifiers to add focus. Examples such as:

Authentic assessment which is a philosophical understanding of assessment where learners perform tasks that demonstrate meaningful application of knowledge, skills, and dispositions within a real life context as used outside of the school setting. The closer the task is to what people face in the world as mechanics, construction workers, designers, business people, politicians, parents, citizens ..., then the more authentic the assessment. Authentic assessment is not synonymous with alternative assessment. An alternative assessment may or may not be authentic.

Ecological or environmental assessment is a philosophical understanding of assessment that focuses on the learner's interactions with the environment.

Alternative Assessment is a philosophical belief that assessment can be achieved through a learning process that includes a broad category of nontraditional assessments (portfolios, open book tests, cooperative projects, peer reviews) that will assess more accurately.

Performance assessment is a task where learner's actions are documented while they complete or attempt to complete tasks where observations of their performances are compared against a scale or range of performances to determine a level of comprehension, skill, and/ or disposition among a continuum of performance possibilities.

This article focuses on assessment as the collection of data to inform. It is the measurement activities educators use to attempt to make valid inferences about learner's knowledge, skills, and dispositions; as well as using those measurements and inferences to make curricular and instructional decisions. Information that should be bias free, fair and consistent (valid and reliable) for the purpose of suggesting developmentally appropriate activities to achieve productive mastery oriented growth and assist with affective communication to learners to assist their learning through instruction.

A cautionary note: assessment and evaluation are related, but they are not the same.

E valu ation - is a process of determining the value, or quality, of a response, product, or performance, that is assessed to make a judgement. It should be based upon established criteria.

It is the evaluator who puts value labels on the different levels of criteria for performance.
When grades are used to put students into categories, failing or passing, it has valued consequences, and hence evaluation.
When teachers are fired based on their students' performance, that is evaluation.
When decisions of consequence are made based on a value judgment, that is evaluation.
Someone decides consequences, often rewards and punishment, based on their evaluation of achievement levels.

One may wonder if all assessment is evaluative.

That does not have to be the case. If the assessment levels are constructed to describe levels of achievement learners develop as they progress to mastery of a topic or skill, then there need be no value placed on performance. There is no need to down grade, or put a value on (evaluate) the learner's performance. The student's performance would be assessed and described by its levels to assess where learning and instruction should start so learning activities could be selected for the learner to move toward mastery.

No value is attached to the performance. As the performance is to be used to make decisions to facilitate mastery.

What to assess?

What to assess is directly related to the most important task in schooling: deciding what learners should learn, to achieve our purpose of education, as described by the curriculum we develop to educate based on what we know about pedagogy (teaching and learning).

However, good assessment relies on knowing and communicating specifically what is expected to be learned in a manner that will allow levels of performance (outcomes) to be decribed in obervable ways.

Good assessment levels are guides that describe how learners progress and are used to guide and anticipate what they might do, by assessing their present understanding and performance, and making instructional and learning decisions. There is no need to add a value judgment as once a value is made, then it become an evaluation: grades, GPA ...

How do you create assessment tasks?

Assessment tasks include four elements:

A prompt or stimulus activity,
Materials to use to achieve a result,
Observable actions, and
Levels of achievement from novice to mastery.

Assessement related considerations

While the following tasks are numbered they do not need to be started or completed in any specific order. They are interrelated and work on one will often necessitate a change in another.

Select topics and big ideas students will learn.
Unpack the topics or big ideas to identify facts, concepts, relationships, generalizations, skills, processes, attitudes, and habits of mind students will need to know to perform activities for the topics or big ideas.
Identify problems, questions, tasks, or activities students can perform to learn the intended information and skills and which will also provide usable information to infer student's understanding and use of the ideas, attitudes, habits of mind, skill to perform tasks, and their abilities to use the necessary practices and processes to investigative the topic to solve problems and gain expertise of the big idea.
Describe how students can be expected to communicate and/ or do the identified problems, questions, tasks, or activities when they have mastery of the topic or big idea.
Describe the kinds of responses learners might have as they begin to learn and how they will imporve as they move toward mastery.
Order them into levels that describe observations of what learners will say or do, when they answer questions or perform tasks or activities. This performance will be the artifact to infer what learers know to assess their learning progress to make teaching and learning decisions.

Selecting and unpacking ideas for assessment

After a topic, big idea, or skill is identified, the first thing to do is decide what information is important to know about it, how understanding can be expressed, and what levels of understanding, competence, or skill learners might have.

This process is known as unpacking a standard or idea and the results are communicated as: big ideas, goals, facts, concepts, generalizations, objectives, or outcomes. See planning and the planning tool box for more information.

Next, is to decide how to determine what leearners know about the concepts or information related to the concepts. To determine how much a person knows about an idea is measurement. All measurement needs a standard or unit to use as a ruler. Since concepts are mental ideas and can't be seen, then they can't be measured directly.

The solution ...

To determine what concepts a person knows and the degree of conceptual understanding they have of those concepts, we observe them as they do or make something. What they say or do is know is an artifact. So in the assessment game concepts or ideas are associated to what a person can do to indicate their conceptualization of a particular concept. In other-words what a person can do is used to infer what their understanding is for the ideas being assessed.

Usually those ideas have different levels of understanding, and those levels are used as the measuring guide or ruler. The measuring guide, therefore, describes different levels of something a person can do to indicate the person's level of understanding or skill. This information is known as a scoring guide or rubric and each level has information know as outcomes.

Placing a learner at a particular level is the measurement aspect. Just as the length of a room can be measured differently by different people the placing of learners at different levels can also occur. These differences or inconsistencies are attributed to bias, or an assessment that is not valid or reliable.

For example. A scoring guide or rubric that has clear descriptions of levels with sharp differences between level outcomes will be easier for people to score a learner's behavior or artifact. It would also provide for more agreement between different teachers scoring the same behavior or artifacts resulting with more scores at the same level for the same learner or similar learner outcomes. This assessment would be considered as a more valid assessment. Validity is how well the assessment measures what it is supposed to measure. Ways to increase validity.

The amount of agreement among different observer's measurements or level placements to each others determines the assessment's reliability. A test’s consistency or the degree to which an assessment yields consistent results.

Ways to attain reliability include test-retest, alternate form, split-half, and inter rater comparisons. The manner in which an assessment is created, implemented, and scored all affect its reliability. Ways to increase reliability.

Quality Assessment Standards - check list

1. The assessment items provide opportunities to demonstrate soutcomes that match and measure the intended big ideas, goals, concepts, skills, and standards.

(YES - NO)

2. Learners have opportunity to learn.

(YES - NO)

3. Free of bias.

(YES - NO)

4. Developmentally and grade level appropriate.

(YES - NO)

5. Consistent, reliable and valid.

(YES - NO)

6. Appropriate levels. Be aware of levels that are camouflaged evaluative statements.

(YES - NO)

Determining outcomes

Each topic or subtopic can be stated as a big idea. A statement that captures the essence or power of the topic. Each topic must be unpacked for its powerful ideas identified along with all the necessary subtopics.

Suggestions for unpacking big ideas for their concepts and outcomes can be found in planning and other subject and topic related areas on this site.

Whether you create your own or use concepts and outcomes from curriculums or standard documents it is essential you have a very good understanding of what is desirable for learners to know (concepts, relationships, generalizations) and do (outcomes, skills) at their specific level to be able to create and implement assessment.

What is authentic assessment?

Authentic assessment is when the task students perform to demonstrate a behavior, skill or create an artifact is a meaningful, often real world, application of knowledge, skills, and dispositions. The closer the task is to what people do in the world as architects, machinists, doctors, mechanics, construction workers, designers, business people, politicians, parents, citizens, the more authentic the assessment.

An examples of an authentic assessments in mathematics for algebra and pattern recognition.
Compared to the same task as teacher directed or work book kind of assessment.

When do teachers assess learners' attainment of outcomes?

If assessment is ongoing and continuous, then the answer to the question ...

is always!

However, assessment that is ongoing and continuous, has an element of different times for assessment that has meaning to the assessor and the person being assessed.

This time element can be separated into four categories, which are helpful to use when making decisions to facilitate learning.

Four different times for assessment are: diagnostic, formative, summative, and generative.

While it's possible any assessment task, activity, or question might fit in any and all of these four categories, here are some important reasons considering each of the four.

Four different times of assessment

1 - Diagnosis

The major characteristic to associate with diagnostic assessment is that it is preliminary.

It is to probe what is known before facilitating instruction. It usually precedes learning activities, but doesn't have to. It can come at anytime during a lesson.

For example, if during a lesson a question arises, that depends on background information, the teacher can ask a diagnostic question to check the learner's level of understanding for that background information. At the end of that diagnostic question, she can decide if the learners are ready to proceed or if background information needs to be developed before continuing the day's planned activity. Diagnose of readiness.

2 - Formative

This type of assessment is used to check the learners progress toward learning. It can happen at anytime during a lesson and is usually understood as such.

3 - Summative

This kind of assessment is usually associated with the time immediately after facilitating learning.

However, what is that time frame?

Is it at the end of a five minute mini-teach where summative assessment is to summarize what was learned in the five minutes? Or is it a time frame of an hour, day, week, month, or year?

One could argue that it is only summative if you are inclined to think the learners understand the concepts and can perform the outcomes, other wise it could be considered formative.

Whatever, it usually is considered the last assessment before the teacher moves to another topic. It could be the first summary check, or a question to double or triple check, and of course the assessment results will be within the range of acceptable or above.

4 - Generative.

This assessment is to inquire into the learner's understanding of being able to apply, use, adapt, alter, or join ideas that have been taught. The purpose of seeing how well learners understand and can apply what they learned.

Assessment that tries to determine if the information has become strong enough to be usable beyond the scope of the examples to which they are familiar or examples that are similar to what was presented during instruction. Or are they able to use it in ways that were not presented to demonstrate a variety of application, analysis, and synthesis with the information.

Summary

Again, any one of these kinds can come at anytime of the lesson and only makes sense with respect to the purpose of the teacher within a sequence of facilitating learning.

When planning, a teacher can anticipate all four types of assessment that will be used through out a sequence for each concept within the sequence.

The planning will prepare the teacher to interact with learners and be ready to facilitate their learning in real time that will be individualized for each learner.

Procedure for Creating a Scoring Guide or Rubric

Identify the topic, big idea, standard, or skill.
Unpack it to identify the information students need to know: facts, concepts, relationships, generalizations, skills. See procedures and frameworks above and see lists of concepts and misconceptions in subject resources.
Consider the students' developmental level of understanding.
Consider misconceptions learners might have and their sources.
Start either at the top or bottom level. If information unpacked for what students should know is used, then it will describe either the top level or an acceptable level below the top. If information is used related to misconceptions, perceptual, or naive understandings, then it will describe the bottom level of understanding. Next information can be identified between these as steps or levels students will progress through as they begin with their initial understandings and progress across the levels to become as expert as defined. These levels describe the different levels of understanding of information, for the topic, sequenced from lowest to highest that students need to know. This information should be: facts, concepts, relationships, generalizations, skills.
In this steps, which is harder, is to identify outcomes to describs what students do which can be observed to infer their undertanding at the different levels.
Lastly after the outcomes are defined and described they need to be communicated as levels. There are two basic ways to do this. One is to separate each fundamental idea related to the bigger idea into categories and describe outcome levels for each catgory seprately in a chart or outline form. The second is to describe all the information in a narrative for each level.

Example:

Start by deciding the number of assessment levels and naming them. For example four: beginning, progressing, proficient, and advanced.

Then write statements under each category to describe what it would look like if students are to do something at each level. For example: problem solving:

Problem solving scoring quide or rubric: outlined in a narrative format

Beginning - Try to solve problems from memory. Believe solutions are known and need to be recalled or they just pop into your mind.
Progressing - Try to solve problems from memory, but if unable to recall a procedure and strategy will seek to understand the problem and try to discover them to use to solve the problem by random search for a solution.
Proficient - Try to solve the problem from memory, but if unable will implement a hueristic or procedure that includes steps to understand the problem, restate it in their own words, seek strategies, choose and implement a strategy, and check its accuracy.
Advanced - Try to solve the problem from memory, but if unable will implement a hueristic or procedure in a systematic manner that includes steps to understand the problem, restate it in their own words, seek strategies, choose and implement a strategy, check its accuracy by trying alternative stratgies to achieve equal results, and reflects on the process while solving and after.

Notice the results above are in an outline form with a narrative for each level that combines several categories within each level. Below is the same information in a table format.

Problem solving scoring quide or rubric: in a table with a narrative format

	Beginning	Progressing	Proficient	Advanced
Indicator	Try to solve problems from memory. Believe solutions are known and need to be recalled or they just pop into your mind.	Try to solve problems from memory, but if unable to recall a procedure and strategy will seek to understand the problem and try to discover them to use to solve the problem by random search for a solution.	Try to solve the problem from memory, but if unable will implement a hueristic or procedure that includes steps to understand the problem, restate it in their own words, seek strategies, choose and implement a strategy, and check its accuracy.	Try to solve the problem from memory, but if unable will implement a hueristic or procedure in a systematic manner that includes steps to understand the problem, restate it in their own words, seek strategies, choose and implement a strategy, check its accuracy by trying alternative stratgies to achieve equal results, and reflects on the process while solving and after to improve on their problem solving abilities.

Next lets see what a scoring guide will look like if the subcategories are separated.

Problem solving scoring quide or rubric: outlined in a narrative format

Indicator	Beginning	Progressing	Proficient	Advanced
Reflects	Try to solve problems from memory.	Try to solve problems from memory and thanks about the steps taken and how to implement them and make decisions as they solve the problem.	Try to solve problems from memory and thinks about the process while solving	Try to solve problems from memory and reflects on the process while solving and after to improve on their problem solving abilities.
Hueristic, procedure	No set plan.	Aware that a series of steps can be followed to solve problems.	Know a comprehensive hueristic and use it in a flexible and accurate way to solve problems.	Know a comprehensive hueristic and use it in a systematic, flexible, efficient, and accurate way to solve problems.
Implement a strategy	Occasionally selects an appropriate strategy.	Aware of different strategies and attempts to implement one.	Aware of different strategies and use one or two to solve most problems.	Aware of multiple strategies and implemnts them to check their accuracy and marvel over how different strategies can achieve verification and confidence in solutions.
Checks or finds multiple solutions	If achieves a solution does with difficulty and satisfied with success.	Solves the problem singularly and listens when other solutions are presented.	Solves the problem in multiple ways that are similar or inverse operations, listens when other solutions are presented.	Generates multiple ways ways to solve the problem with some being unusual and not often selected as solutitons, listens when other solutions are presented and offers suggestions for improvement.
Communicate	May be able to summarize a few highlights when asked.	Communicates results orally and in writing.	Communicates how results were achieved orally and in writing with information for the hueristic, strategies, and multiple solutions to verify them.	Communicates how results were achieved orally and in writing with information for the hueristic, strategies, multiple solutions to verify them, and how metacognition is helpful for problem solving and excited about improving their problem solving with metacognition.

COOL!

RIGHT?

Standardized tests

Standardized tests have the same format and types of questions for the same content to administer to wide groups of people. They are normed and their scores ranked on a bell curve.

There is a difference between norm-referenced and criterion-referenced tests.

There is no difference between norm-referenced and standardized tests.

Pros

Standardized tests are predictive of later outcomes in school.
Can provide learners, students, families, teachers, administrators, researchers, and other the stake holders about areas of concern, strength, for accountability and improve instruction .

Cons

They are prone to inaccuracies.
Tend to have an influence on the narrowing of the curriculum as teachers teach to the test.
Users take excess time to preparation,with test prep.

Norm-Referenced tests

Norm-Referenced tests are standardized tests based on a representative group. They measure and rank test takers to each other. Each person’s score is compared to the norm of a similar predetermined peer group test takers and may be reported as a percentile, grade equivalent or stanine. Which, suggests what the test taker knows as an individual and what they know related to a group.

Student performance is compared to a standardized group with such statements as:

Your child scored at the 50% percentile on the Iowa Test of Basic Skills. That can be interpreted to mean that 49% of the students that took the test scored lower and 49% of the students scored higher.
Your child scored at the 99% percentile on the Iowa Test of Basic Skills. That can be interpreted to mean that 99% of the students that took the test scored lower, none of the students scored higher, and 1% had the same score.
Your child scored at the 75% percentile on the Iowa Test of Basic Skills. That can be interpreted to mean that 74% of the students that took the test scored lower, 24% of the students scored higher, and 1% scored the same.

Examples:

Iowa Test of Basic Skills (ITBS)
California Achievement Test (CAT)
American College Testing (ACT)
Metropolitan Reading Readiness Test (MRT)
Cognitive Abilities Test (CogAt)

Strengths:

Provide information about the achievement of individual students or groups of students
Identification of possible ways to improve school curriculums or programs
Purchased, administered, and scored inexpensively
Supplements other assessment methods to clarify the larger picture of student performance
Objective scoring procedure

Concerns/Weaknesses:

People too often miss use test to categorize and label students in ways that can cause damage
People frequently misuse test scores to make improper comparisons between schools, districts, classes
Fails to promote individual student learning
Is a poor predictor of individual student performance
Usually mismatches with the content of a school’s curriculum
Can be used to dictate and restrict curriculum
People too often assume that test scores are infallible
Too often people develop an over reliance on this one type of assessment
Results are based on a normal distribution (bell-shaped curve)
Measures students against other students
Sorts students into winners and losers
Does not test for what students know in a manner that can be used to facilitate learning.

Criteria Referenced Assessment

Criterion reference assessment, or test, compares a learner’s performance or academic achievement to a set of curricular criteria, standards, or outcomes.

A level of achievement is based on a norm or criteria which is established from the curriculum or standard before the test is taken.

A scoring guide or rubric is created to communicate different levels of achievement or a standard may be set as a percentage.

The score should show the learner's progression toward the desired outcome or standard.

Steps for Creating criteria referenced assessments

Identify a big idea or a general description of what learners are to know. To be assessed on.
Identify facts, concepts, generalizations, skills, process, and/or attitudes needed to understand that characterize or demonstrate understanding of the big idea of what students are to know.
Identify problems, questions, tasks, or activities that learners can perform that will provide usable information to make inferences about their knowledge of content, processes, inquiry, and dispositions. For example:

Their understanding of the concepts or generalizations;
Their attitudes related to the topics, concepts, and/ or generalizations;
Their skills or ability to perform any necessary skills;
Their abilities to use inquiry and selected investigative practices and processes required for success.

Describe what learners can be expected to communicate and/ or do if they are to successfully complete the identified problems, questions, tasks, and/ or activities for each selected dimension.
Select one or more problems, questions, tasks, or activities that can be used to initiate performance of the selected task that will provide assessment data to infer their understanding, attitude, and/ or skills in the identified dimensions.
Create the assessment problems, questions, tasks, or activities, as they will be presented.
Create and write all administration guidelines that are necessary to engage learners in the assessment task so that they might be successful in completing it.
Review the needs for all learners that will be assessed to identify all accommodations for special needs and how to provide for those accommodations.
Identify and describe what kinds of responses learners might have for each item for different levels of understanding or skills that might be seen in an artifact created by a student or viewed when performed by them.
Order the different descriptions from low to high levels and select or modify them to create a scoring guide or rubric.
Describe how the scoring guide or rubric will be checked for consistency, validity, and reliability.
Describe how to check the assessment items for bias so that no one will be offended or unfairly penalized.
Pilot assessment task/tool to see how it works with students.

Accommodation information

Accommodation introduction and examples.

Relationship of assessing growth and achievement

The assessment and accountability movement has caused people to realize that reporting and relying on achievement or proficiency alone to rate teachers gives an incomplete picture.

Students can come into a grade above or below grade level and make little or very good progress and that progress or lack of progress will not be represented in an achievement or proficiency score alone. Therefore, making it impossible to determine a teacher's success or failure with an end of year achievement or proficiency score. Therefore, it is thought data must be collected to determine both achievement/ proficiency and rate of achievement or growth.

Reporting and displaying data for both growth and achievement is one way schools, government, and other stake holders believe they can achieve a more accurate view of what is being achieved by students and teachers.

For example: scores for achievement could be reported in terms of a percent for proficiency and growth reported in terms of an average growth percentile for individual or groups of students. These scores would then be plotted on a graph with 0-100% on each axis where the fifty percentile divides each in half. Creating the representation, like the one below, with four quadrants, one in which each of the combined scores might fall.

Validity & reliability

Validity flows from human judgment about the persuasiveness of a particular validity argument and the evidence on which that argument has been fashioned.

W. James Popham

Procedures and guiding principles to make more consistent, reliable, & valid assessment decision

Introduction

Overview

Introduction
Suggestions and procedures to evaluate assessment claims
Procedures to follow to create & use assessments that are validate & reliable
Assessment evaluation examples
- High stakes testing for teacher retention
- Portfolios for preservice retention & certification

This section includes suggestions and procedures to logically evaluate the validity of inferences and claims made about assessment or with assessments.

Procedures to follow to create, implement, evaluate, and report assessments validity and reliability.

And examples with some initial ideas for using high stakes testing for teacher retention and portfolios for preservice retention and certification.

Suggestions and procedures to evaluate assessment claims

Inspired by Greg Thompson, David Rutkowski, & Leslie Rutkowski

Every claim made about an assessment results, needs for and against arguments and claims to make valid interpretations. It is the responsibility of those making a claim to provide evidence for arguments that both support and do not support their claim. With the most supported claims being the most valid. A process to do this can be achieved by altering Stephen Toulmin's position analysis heuristic.

Position Analysis Heuristic

Model to evaluate the validity of claims about assessment data

Altered to create a model to evaluate a claim and its opposite claim.

Let's review some vocabulary:

Validity - An assessment that is valid, measures what it claims to measure.

Reliable - An assessment that is reliable, provides consistent results across time and users.

An assessment must be reliable to be valid, but doesn't have to be valid to be reliable. A test may give the same consistent range of results over time and different groups (reliable) , but it may not measure what it is claimed to measure (valid).

Determining validity requires constructing and evaluating arguments for and against the intended interpretation of assessment scores and their relevance to the proposed use.

If there is no compelling evidence (or argument) that undermines or falsifies an inference, then that inference is reasonably valid.

Let's review a procedure to use with the model to determine validity.

Procedure to use the model

The model can be used to verify or falsify an inference or intended use of data with a procedure such as:

Outline the claim to gather enough information to clearly understood and communicate it.
Outline the opposite (falsifying) claim to gather enough information to clearly understood and communicate it.
Find the evidence - use multiple sources to collect supporting and falsifying information.
Consider the stakes - the higher the stakes the stronger the quality and scale of evidence that needs to be gathered for both claims and their inferences.
Decide on the most valid interpretation - is the claim or opposite claim best supported by the collected evidence.

Summary

All information should be revisited until there is sufficient trust in how the claims and opposite claims will be used or rejected; and that use or non use can be communicated in easily understandable ways to those affected by any decisions.

Procedures to follow to create and use valid and reliable assessment

Inspired by Barbara S. Plake - Buros Center for Testing.

Procedure to create assessments

A diverse team composed of the following members met once a week for two hours during a school year to create learning outcomes, concepts, and assessment items, for the express purpose of using them to create assessments to measure student achievement and progress on the school and state standards: curriculum director, external consultant, classroom teachers from the grade for which the assessments were being developed and classroom teachers above and below that grade level.

The team considers the following principles when making decisions in an attempt to assure the possibility of maximizing the validity of each item and instrument created. The principles are organized by tasks:

Development,
Implementation,
Scoring and evaluation, and
Reporting.

Development of assessment instruments

The construct (what is being assessed) is understood and can be written as concepts and or skills (how the information is expressed mentally) and outcomes (artifacts created by the learners to demonstrate their level of understanding).
The assessment item is appropriate for the construct being measured.
The assessment item is appropriate for the developmental level of the learner.
Other reasons the learner might do well on the construct have been searched for and ruled out.
Reasons the learner might do poorly on the construct, not related to the construct, have been searched for and ruled out.
Students' learning of the related information will be facilitated in a similar manner (materials used, kinds of questions asked or tasks given, types of answers expected, time allowance, …) as it will be assessed.
All necessary and sufficient information needed by the learners to successfully complete each item is or will be taught.
All learners have sufficient opportunities of access to learn the necessary and sufficient information needed to be successful on each item.
The number of items in different areas are proportional to what is emphasized during instruction.
Cognitive demands of items match the intended interpretations of the assessments.
Tasks similar to the selected tasks would give similar results.

Implementation of assessment instrument

Learners have been informed of the assessment and feel confident they are ready for the challenge.
Learners have been given opportunities to ask questions about the test.
Learners are motivated to take the test.
Instructions have been clearly explained to all learners in a similar manner.

Scoring and evaluation of assessment instrument

The scoring key has been validated.
The scoring process is clear.
The scoring rubric is clearly understood by all evaluators.
Performance level descriptors are meaningful.
Performance level indicators are developmentally appropriate.
The rubric fits the construct.
The rubric is congruent with the instructional emphasis.
The scores reflect the students' abilities and skills and are not due to scorer bias.
Scoring is replicable and different raters are confident that similar abilities and skills are scored similarly.
It is evident that students' scores indicate that the instructions were clear.
It is evident that students knew what they needed to do to be successful on the task.
Performance category and cutscore decisions are sound and defensible.
Results are consistent with teacher expectations.
There are previous results to suggest that these results are probably accurate.
Students' individual scores on one particular assessment relate to their average scores on other similar assessments. Any performance that isn't consistent does not relate to the construct.
Students' performance on a particular assessment will be helpful to make instructional decisions to facilitate students' learning.

Reporting of assessment results

Learners understand their score and its relationship to levels of performance.
Learners understand how the scores will be used.
Score reports are clear.
Reporting of growth is comparable across time.
Reports are consistent with the precisions reported or suggested in the report and the actual precision level of the assessment.
There will be an increase in teacher collaboration until it becomes a significant element of the school culture.
There will be an increased reporting of learner's ability to learn until it becomes a significant element of the school culture.
Reporting of decreased behavioral interactions until they become minimal.
Reporting of decreased referrals for alternative and remedial programs until they become supportive mainstream programs.
Reporting in decrease of student completion of task completion until virtually nonexistent.
Reporting of teacher satisfaction in program development and curriculum decisions.
Reporting of positive teacher and student morale.
Reporting of learner support for the curriculum.

Assessment evaluation for validity examples with initial ideas

Hight stakes testing for teacher retention

Claim - End of Year tests (EOY) can be used for teacher assessment and retention.

Opposite (falsifying) claim - End of Year tests (EOY) can NOT be used for teacher assessment and retention.

Inference - The EOY test directly reflects teacher quality.

Assumptions & data:

Teachers are fully responsible for student outputs and no other outside factors contribute to student scores.
The assessment results are an accurate and comprehensive measure of the implemented curriculum.
All children have had an opportunity to learn the curriculum
All students are motivated and do their best on the assessment.
The test is reliable.
Technical reports are available to assure the reliability of the test.
Technical report has only data on the takers achievement, no reference to the reliability of it as a measure of the teacher effectiveness of the instruction of the material being tested is included.
Therefore, EOY assessment is not a reliable measure of teacher instruction.

Portfolios for retention and certification

Claim - Preservice teachers portfolio artifacts can be used for their retention and certification.

Opposite (falsifying) claim - Preservice teachers portfolio artifacts can NOT be used for their retention and certification.

Inference - The portfolio artifacts directly reflect teacher qualifications for certification.

Assumptions & data:

Teachers are fully responsible for the artifacts they collect.
The artifacts represent an accurate and comprehensive measure of their performance.
All Preservice teachers have had an opportunity to learn the curriculum.
The artifacts scores are reliable.
Multiple sources create a better sample of a teacher's work than a single sample.

Resources

How to Make More Valid Decisions about Assessment Data. Greg Thompson, Leslie Rutkiwski, & David Rutkowski. Kappan. March 2023. p 34-39.
How to make more valid decisions about assessment data. Greg Thompson, David Rutkowski, and Leslie Rutkowski. Kappan. February 27, 2023.

Assorted assessment and evaluation links

Sample letter introducing parents to authentic assessment and student lead conference
Consideration for dropping a grade - yes or no?

Practice assessment scenarios and role play for preprofessional and professional educators:

Assessment and evaluation simulation for The Fish in the Park Experience - science focused
Birds, beaks, and adaptation

Home: Pedagogy - theory, curriculum, learning, human development, & teaching

Top