Assessment & evaluation

Assessment: introduction for learning and teaching

Purpose of assessment is not to rank and sort students, but to measure what learners know and are able to do. It should be an integral part of the instructive process that build's on thelearner's sunderstanings and increases their self-efficacy and fosters motivation and engagement.

R Sweetland

Overview

Introduction
Why assess?
What is assessment?
Assessment tasks
Selecting and unpacking ideas for assessment
Quality Assessment Standards & check list
Determining outcomes
What is authentic assessment?
When do teachers assess students' attainment of outcomes?
Four different assessment times:

diagnostic,
formative,
summative, and
generative

Procedure for Creating a Scoring Guide or Rubric
Standardized tests
- Norm-Referenced tests
Criteria referenced assessment
- Steps for Creating Criteria Referenced Assessment
Accommodation information
Relationship of assessing growth and achievement
Assorted assessment and evaluation links
Practice assessment scenarios and role play for preprofessional and professional educators:
Assessment glossary

Introduction

This article reviews assessment, why assess, what to assess, assessment tasks, suggestions on how to select and unpack ideas for assessment, quality standards, determining outcomes, authentic assessment, different ways to assess and when.

Other topics include: evaluation, different assessment times, scoring guides, rubrics, procedures for creating them, criteria referenced, norm-referenced, relationship of growth and achievement, and practice scenarios for professional growth.

Why assess?

Assessment is conducted continuously by both professional educators and learners with the purpose to improve learning and instruction.

What is assessment?

What to assess should be directly related to the most important task in schooling: deciding what learners should learn, which relates to other areas of pedagogy.

However, good assessment relies on knowing and communicating specifically what is expected to be learned in a manner will will allow levels of performance to be decribed in obervable ways.

With that as a background, this article focuses on assessment as the collection of data to inform. It is the measurement activities educators use to attempt to make valid inferences about learner's knowledge, skills, and dispositions; as well as using those measurements and inferences to make curricular and instructional decisions. Information that should be bias free, fair and consistent (valid and reliable) for the purpose of suggesting developmentally appropriate activities to achieve productive mastery oriented growth and assist with affective communication to learners to assist their learning through instruction.

A cautionary note: assessment and evaluation are related, but they are not the same.

E valu ation - is a process of determining the value, or quality, of a response, product, or performance, that is assessed to make a judgement. It should be based upon established criteria.

It is the evaluator who puts value labels on the different levels of criteria for performance.
When grades are used to put students into categories, failing or passing, it has valued consequences, and hence evaluation.
When teachers are fired based on their students' performance, that is evaluation.
When decisions of consequence are made based on a value judgment, that is evaluation.
Someone decides consequences, often rewards and punishment, based on their evaluation of achievement levels.

One may wonder if all assessment is evaluative. However, that doesn't have to be the situation. If the assessment levels are constructed to describe levels of achievement students will develop as they progress to mastery of a topic or skill, then there need be no value placed on student's performance. No need to down grade, or evaluate young students performance poorly when it is below the top level as the student's performance would be assessed and described by its levels to decide where learning and instruction should start and what kinds of learning activities would be best to move closer toward mastery. No value is attached. Only decisions based on the student's performance at his or her current developmental level.

Good assessment levels are guides that describe how students progress and can be used to guide and anticipate what students might do, by assessing their present understanding and performance, and making instructional and learning decisions. Once a value is added or a judgement is made, then it become an evaluation: grades, GPA ...

This article focuses on assessment of learning. For information related to teacher assessment see - teaching in the pedagogy directory.

Assessment tasks

Assessment tasks can be described with four elements:

A prompt or stimulus activity,
Materials to use to achieve a result,
Observable actions, and
Acceptable levels of achievement.

Assessement related tasks

While the following tasks are numbered they do not need to be started or completed in any specific order. They are interrelated and work on one will often necessitate a change in another.

Decide topics and big ideas students will learn.
Unpack the topics or big ideas to identify facts, concepts, relationships, generalizations, skills, processes, attitudes, and habits of mind students will need to know to perform activities for the topics or big ideas.
Identify problems, questions, tasks, or activities students can perform to learn the intended information and skills and which will also provide usable information to infer student's understanding and use of the ideas, attitudes, habits of mind, skill to perform tasks, and their abilities to use the necessary practices and processes to investigative the topic to solve problems and gain expertise of the big idea.
Describe how students can be expected to communicate and/ or do the identified problems, questions, tasks, or activities when they have mastery of the topic or big idea.
Describe the kinds of responses students might have as they begin to learn and on through mastery. Order them into levels that describe observations of what students will say or do, when they answer questions or perform tasks or activities. The student's performance will be the artifact to infer what students know to assess their learning and make teaching and learning decisions.

Selecting and unpacking ideas for assessment

After a topic, big idea, or skill is identified, the first thing to do is decide what information is important to know about it, how understanding can be expressed, and what level of understanding, competence, or skill students should have. This process is known as unpacking a standard or idea and the results are communicated as: big ideas, goals, facts, concepts, generalizations, objectives, or outcomes. See planning and the planning tool box for more information.

Next, is to decide how to determine what students know about the concepts or information related to the concepts. To determine how much a person knows about an idea is measurement. All measurement needs a standard or unit to use as a ruler. Since concepts are mental ideas and can't be seen, then they can't be measured directly. The solution...

To determine what concepts a person knows and the degree of conceptual understanding he or she has of those concepts we observe students as they do or make something. What they say or do is know as an artifact. So in the assessment game concepts or ideas are associated to what a person can do to indicate their conceptualization of a particular concept. In other-words what a person can do is used to infer what there understanding is for the ideas being assessed. Usually those ideas have different levels of understanding and those levels are used as the measuring guide or ruler. The measuring guide, therefore, describe different levels of something a person can do to indicate the person's level of understanding or skill. This information is known as a scoring guide or rubric and each level has information know as outcomes.

Placing a student at a particular level is the measurement aspect. Just as the length of a room can be measured differently by different people the placing of students at different levels can also occur. These differences or inconsistencies are attributed to bias, or an assessment that is not valid or reliable.

For example. A scoring guide or rubric that has clear descriptions of levels with sharp differences between level outcomes will be easier for people to score student's behavior or artifact. It would also provide for more agreement between different teachers scoring the same behavior or artifacts resulting with more scores at the same level for the same student or similar student understanding. This assessment would be considered as a more valid assessment. Validity is how well the assessment measures what it is supposed to measure. Ways to increase validity.

The amount of agreement among different observer's measurements or level placements to each others determines the assessment's reliability. A test’s consistency or the degree to which an assessment yields consistent results; ways to attain reliability include test-retest, alternate form, split-half, and inter rater comparisons. The manner in which an assessment is created, implemented, and scored all affect its reliability. Ways to increase reliability.

Quality Assessment Standards - check list

Match and measure the intended big ideas, goals, outcomes, concepts, skills, standards (YES - NO)
Students have opportunity to learn (YES - NO)
Free of bias (YES - NO)
Developmentally and grade level appropriate (YES - NO)
Consistent, reliable and valid (YES - NO)
Appropriate levels. Be aware of levels that are camouflaged evaluative statements. (YES - NO)

Determining outcomes

Each topic or subtopic can be stated as a big idea. A statement that captures the essence or power of the topic. Each topic must be unpacked or its powerful ideas identified along with all the necessary subtopics.

Suggestions for unpacking, concepts, and outcomes can be found in planning and other subject and topic related areas on this site.

Whether you create your own or use concepts and outcomes from curriculums or standard documents it is essential you have a very good understanding of what is desirable for students to know (concept, relationship, generalization) and do (outcome, skills) at their specific level to be able to create and implement assessment.

What is authentic assessment?

Authentic assessment is when the task students perform to demonstrate a behavior, skill or create an artifact is a meaningful, often real world, application of knowledge, skills, and dispositions. The closer the task is to what people do in the world as architects, machinists, doctors, mechanics, construction workers, designers, business people, politicians, parents, citizens, the more authentic the assessment.

An examples of an authentic assessments in mathematics for algebra and pattern recognition.
Compared to the same task as teacher directed or work book kind of assessment.

When do teachers assess students' attainment of outcomes?

If assessment is ongoing and continuous, then the answer to the question. When should we assess? Is answered with always.

However, if assessment is ongoing and continuous there is an element of time for assessment that has meaning to the assessor and the person being assessed with respect to self-assessment.

The time element can be separated into four categories, which are helpful to use when making decisions to facilitate learning.

Four different times for assessment - diagnostic, formative, summative, and generative

While it's possible any assessment task, activity, or questions might fit in any and all of these four categories, here are some important reasons to consider each of the four.

Four different times of assessment

1 - Diagnosis.

The major characteristic to associate with diagnostic assessment is that it is preliminary.

It is to probe into what is known before facilitating instruction. It usually precedes learning activities, but doesn't have to. It can come at anytime during a lesson. For example, if during a lesson a question arises, that depends on background information, the teacher can ask a diagnostic question to check the students' level of understanding for that background information. At the end of that diagnostic question, she can decide if the students are ready to proceed or if the background information needs to be developed before continuing on the day's planned activity. Diagnose of students' readiness.

2 - Formative.

This type of assessment is used to check the students' progress toward learning. It can happen at anytime during a lesson and is usually understood as such.

3 - Summative.

This kind of assessment is usually associated with the time immediately after facilitating learning. However, what is that time frame? Is it at the end of a five minute mini-teach where summative assessment is to summarize what was learned in the five minutes? Or is it a time frame of an hour, day, week, month, or year?

One could argue that it is only summative if you are inclined to think the students understand the concepts and can perform the outcomes, other wise it could be considered formative. Whatever, it usually is considered the last assessment before the teacher moves to another topic. It could be the first summary check, or a question to double or triple check, and of course the assessment results will be within the range of acceptable or above.

4 - Generative.

This assessment is to inquire into the students' understanding of being able to apply, use, adapt, alter, or join ideas that have been taught. The purpose of seeing how well students understand what they learned.

Assessment that tries to determine if the information has become strong enough to be usable beyond the scope of the examples to which they are familiar or examples that are similar to what was presented during instruction, or are they able to use it in ways that were not presented and demonstrate a variety of application, analysis, and synthesis with the information.

Again, any one of these kinds can come at anytime of the lesson and only makes sense with respect to the purpose of the teacher within a sequence of facilitating learning. When planning a teacher can anticipate all four types of assessment that will be used through out a sequence for each concept within the sequence. The planning will prepare the teacher to interact with students and be ready to facilitate their learning in real time that will be individualized for each student.

Procedure for Creating a Scoring Guide or Rubric

Identify the topic, big idea, standard, or skill.
Unpack it to identify the information students need to know: facts, concepts, relationships, generalizations, skills. See procedures and frameworks above and see lists of concepts and misconceptions in subject resources.
Consider the students' developmental level of understanding.
Consider misconceptions students might have and their sources.
Start either at the top or bottom level. If information unpacked for what students should know is used, then it will describe either the top level or an acceptable level below the top. If information is used related to misconceptions, perceptual, or naive understandings, then it will describe the bottom level of understanding. Next information can be identified between these as steps or levels students will progress through as they begin with their initial understandings and progress across the levels to become as expert as defined. These levels describe the different levels of understanding of information, for the topic, sequenced from lowest to highest that students need to know. This information should be: facts, concepts, relationships, generalizations, skills.
In this steps, which is harder, is to identify outcomes to describs what students do which can be observed to infer their undertanding at the different levels.
Lastly after the outcomes are defined and described they need to be communicated as levels. There are two basic ways to do this. One is to separate each fundamental idea related to the bigger idea into categories and describe outcome levels for each catgory seprately in a chart or outline form. The second is to describe all the information in a narrative for each level.

Example:

Start by deciding the number of assessment levels and naming them. For example four: beginning, progressing, proficient, and advanced.

Then write statements under each category to describe what it would look like if students are to do something at each level. For example: problem solving:

Problem solving scoring quide or rubric: outlined in a narrative format

Beginning - Try to solve problems from memory. Believe solutions are known and need to be recalled or they just pop into your mind.
Progressing - Try to solve problems from memory, but if unable to recall a procedure and strategy will seek to understand the problem and try to discover them to use to solve the problem by random search for a solution.
Proficient - Try to solve the problem from memory, but if unable will implement a hueristic or procedure that includes steps to understand the problem, restate it in their own words, seek strategies, choose and implement a strategy, and check its accuracy.
Advanced - Try to solve the problem from memory, but if unable will implement a hueristic or procedure in a systematic manner that includes steps to understand the problem, restate it in their own words, seek strategies, choose and implement a strategy, check its accuracy by trying alternative stratgies to achieve equal results, and reflects on the process while solving and after.

Notice the results above are in an outline form with a narrative for each level that combines several categories within each level. Below is the same information in a table format.

Problem solving scoring quide or rubric: in a table with a narrative format

	Beginning	Progressing	Proficient	Advanced
Indicator	Try to solve problems from memory. Believe solutions are known and need to be recalled or they just pop into your mind.	Try to solve problems from memory, but if unable to recall a procedure and strategy will seek to understand the problem and try to discover them to use to solve the problem by random search for a solution.	Try to solve the problem from memory, but if unable will implement a hueristic or procedure that includes steps to understand the problem, restate it in their own words, seek strategies, choose and implement a strategy, and check its accuracy.	Try to solve the problem from memory, but if unable will implement a hueristic or procedure in a systematic manner that includes steps to understand the problem, restate it in their own words, seek strategies, choose and implement a strategy, check its accuracy by trying alternative stratgies to achieve equal results, and reflects on the process while solving and after to improve on their problem solving abilities.

Next lets see what a scoring guide will look like if the subcategories are separated.

Problem solving scoring quide or rubric: outlined in a narrative format

Indicator	Beginning	Progressing	Proficient	Advanced
Reflects	Try to solve problems from memory.	Try to solve problems from memory and thanks about the steps taken and how to implement them and make decisions as they solve the problem.	Try to solve problems from memory and thinks about the process while solving	Try to solve problems from memory and reflects on the process while solving and after to improve on their problem solving abilities.
Hueristic, procedure	No set plan.	Aware that a series of steps can be followed to solve problems.	Know a comprehensive hueristic and use it in a flexible and accurate way to solve problems.	Know a comprehensive hueristic and use it in a systematic, flexible, efficient, and accurate way to solve problems.
Implement a strategy	Occasionally selects an appropriate strategy.	Aware of different strategies and attempts to implement one.	Aware of different strategies and use one or two to solve most problems.	Aware of multiple strategies and implemnts them to check their accuracy and marvel over how different strategies can achieve verification and confidence in solutions.
Checks or finds multiple solutions	If achieves a solution does with difficulty and satisfied with success.	Solves the problem singularly and listens when other solutions are presented.	Solves the problem in multiple ways that are similar or inverse operations, listens when other solutions are presented.	Generates multiple ways ways to solve the problem with some being unusual and not often selected as solutitons, listens when other solutions are presented and offers suggestions for improvement.
Communicate	May be able to summarize a few highlights when asked.	Communicates results orally and in writing.	Communicates how results were achieved orally and in writing with information for the hueristic, strategies, and multiple solutions to verify them.	Communicates how results were achieved orally and in writing with information for the hueristic, strategies, multiple solutions to verify them, and how metacognition is helpful for problem solving and excited about improving their problem solving with metacognition.

COOL!

RIGHT?

Standardized tests

Standardized tests have the same format and types of questions for the same content to administer to wide groups of people. They are normed and their scores ranked on a bell curve.

There is a difference between norm-referenced and criterion-referenced tests.

There is no difference between norm-referenced and standardized tests.

Pros

Standardized tests are predictive of later outcomes in school.
Can provide learners, students, families, teachers, administrators, researchers, and other the stake holders about areas of concern, strength, for accountability and improve instruction .

Cons

They are prone to inaccuracies.
Tend to have an influence on the narrowing of the curriculum as teachers teach to the test.
Users take excess time to preparation,with test prep.

Norm-Referenced tests

Norm-Referenced tests are standardized tests based on a representative group. They measure and rank test takers to each other. Each person’s score is compared to the norm of a similar predetermined peer group test takers and may be reported as a percentile, grade equivalent or stanine. Which, suggests what the test taker knows as an individual and what they know related to a group.

Student performance is compared to a standardized group with such statements as:

Your child scored at the 50% percentile on the Iowa Test of Basic Skills. That can be interpreted to mean that 49% of the students that took the test scored lower and 49% of the students scored higher.
Your child scored at the 99% percentile on the Iowa Test of Basic Skills. That can be interpreted to mean that 99% of the students that took the test scored lower, none of the students scored higher, and 1% had the same score.
Your child scored at the 75% percentile on the Iowa Test of Basic Skills. That can be interpreted to mean that 74% of the students that took the test scored lower, 24% of the students scored higher, and 1% scored the same.

Examples:

Iowa Test of Basic Skills (ITBS)
California Achievement Test (CAT)
American College Testing (ACT)
Metropolitan Reading Readiness Test (MRT)
Cognitive Abilities Test (CogAt)

Strengths:

Provide information about the achievement of individual students or groups of students
Identification of possible ways to improve school curriculums or programs
Purchased, administered, and scored inexpensively
Supplements other assessment methods to clarify the larger picture of student performance
Objective scoring procedure

Concerns/Weaknesses:

People too often miss use test to categorize and label students in ways that can cause damage
People frequently misuse test scores to make improper comparisons between schools, districts, classes
Fails to promote individual student learning
Is a poor predictor of individual student performance
Usually mismatches with the content of a school’s curriculum
Can be used to dictate and restrict curriculum
People too often assume that test scores are infallible
Too often people develop an over reliance on this one type of assessment
Results are based on a normal distribution (bell-shaped curve)
Measures students against other students
Sorts students into winners and losers
Does not test for what students know in a manner that can be used to facilitate learning.

Criteria Referenced Assessment

Criterion reference assessment, or test, compares a learner’s performance or academic achievement to a set of curricular criteria, standards, or outcomes. A level of achievement is based on a norm or criteria which is established from the curriculum or standard before the test is taken. A rubric may be created to communicate different levels of achievement or a standard may be set as a percentage. The score should show the learner's progression toward the desired outcome or standard.

Steps for Creating criteria referenced assessments

Identify a big idea or a general description of what students are to know to be assessed.
Identify facts, concepts, generalizations, skills, process, and/or attitudes needed to understand what is characterized by the big idea or other description of what students are to know.
Identify problems, questions, tasks, or activities that students can perform that could provide usable information to infer students understanding of the concepts or generalizations; their attitudes related to the topics, concepts, and/ or generalizations; their skill or ability to perform necessary or identified skills; and their abilities to use inquiry and selected investigative practices and processes.
Describe what students can be expected to communicate and/ or do if they are to successfully complete the identified problems, questions, tasks, and/ or activities.
Select one or more problems, questions, tasks, or activities that can be used to initiate student’s performance of the selected task that will provide assessment data to infer their understanding, attitude, and/ or skills in the identified areas.
Create the assessment problems, questions, tasks, or activities, as they will be presented to students.
Create and write all administration guidelines that are necessary to engage students in the assessment task so that they might be successful in completing it.
Review the needs for all students that will be assessed to identify all accommodations for special needs and provide for those accommodations.
Identify and describe what kinds of responses students might have for each item for different levels of understanding or skills that might be seen in an artifact created by a student or viewed when performed by a student.
Order the different descriptions from low to high levels and select or modify them to create a scoring guide or rubric.
Describe how the scoring guide or rubric will be checked for consistency, validity, and reliability.
Describe how to check the assessment items for bias so that no one will be offended or unfairly penalized.
Pilot assessment task/tool to see how it works with students.

Accommodation information

Accommodation introduction and examples

Relationship of assessing growth and achievement

The assessment and accountability movement has caused people to realize that reporting and relying on achievement or proficiency alone to rate teachers gives an incomplete picture. Students can come into a grade above or below grade level and make little or very good progress and that progress or lack of progress will not be represented in an achievement or proficiency score alone. Therefore, making it impossible to determine a teacher's success or failure with an end of year achievement or proficiency score. Therefore, data must be collected to determine both achievement/ proficiency and rate of achievement or growth.

Reporting and displaying data for both growth and achievement is one way schools, government, and other stake holders believe they can achieve a more accurate view of what is being achieved by students and teachers. For example: scores for achievement could be reported in terms of a percent for proficiency and growth reported in terms of an average growth percentile for individual or groups of students. These scores would then be plotted on a graph with 0-100% on each axis where the fifty percentile divides each in half. Creating the representation, like the one below with four quadrants, one in which each of the combined scores might fall.

Assorted assessment and evaluation links

Sample letter introducing parents to authentic assessment and student lead conference
Consideration for dropping a grade - yes or no?

Practice assessment scenarios and role play for preprofessional and professional educators:

Assessment and evaluation simulation for The Fish in the Park Experience - science focused
Birds, beaks, and adaptation

Top

Home: Pedagogy - theory, curriculum, learning, human development, & teaching

Top