|
|||||||||
The most important terms used in this handbook are listed as follows:
Term | Definition |
Absolute frequency | “Absolute frequencies are statistical measures that indicate how often a certain characteristic (of a bundle of data, of a variable) occurs in a data set. Absolute frequencies are mostly summarised in frequency tables.” (From: The glossary of market research of FBInnovation, Absolute frequency) |
Added value (media effects) | These are the effects of a specific technology on learning success and these effects should frequently be studied in comparison with other ("conventional") technologies (=> Learning level). |
Arithmetic mean | The sum of all the scores divided by the number of scores (only for => interval- or => ratio-scaled data!). It is also used to calculate a scale value (an average score calculated over all participants and all questions belonging to a scale) (measure of => central tendency). |
Attrition | “The loss of subjects during the course of a study. This may be a threat to the (=>) validity of conclusions if participants of study and comparison/control groups drop out at different rates or for different reasons” (From: Nonprofit Good Practice Guide, Attrition) (=> Non-response rate) |
Central tendency | Measures of central tendency indicate the value that represents the distribution best. The main measures are => arithmetic mean, => median and => mode. |
Concept level | The focus at this level of evaluation is on the project design or planning. Possible questions at this level are about goal clarification, => needs and => feasibility of the study (=> Project and program evaluation) |
Content analysis | “A set of procedures for collecting and organising non-structured information into a standardised format that allows one to make inferences about the characteristics and meaning of written and otherwise recorded material” (From: Program Evaluation Glossary, Content Analysis). |
Contingency coefficient | As a measure to characterise the correlation of two attributes (e.g. Is the field of study chosen dependent on gender?), the contingency coefficient is used on data of a => nominal scale (=> Correlational hypotheses). |
Correlation | A measure of the degree to which two or more variables are related. However, the existence of a correlation does not mean a cause-and-effect relationship. |
Correlational hypotheses | Correlation hypotheses predict a one-way or two-way dependency between two or more variables (=> Inferential statistics). |
Criteria catalogue | A systematic compilation of various questions and rating scales on individual product variables. |
Criterion-oriented tests | Criterion-oriented tests are used to measure individual performance against a predefined criterion (e.g. a certain score). These tests are used, for example, to measure differences in learning achievement after an educational program (=> Standardised tests). |
Cross-sectional designs | Investigation at a single point in time (e.g. survey, document analysis) (=>Non-experimental designs). |
Descriptive statistics | “A term used to denote statistical data of a descriptive kind or the methods of handling such data, as contrasted with theoretical statistics which, though dealing with practical data, usually involve some process of inference in probability for their interpretation” (Glossary of Statistical Terms, Descriptive Statistics) (=>Statistical analysis methods) |
Difference hypotheses | Difference hypotheses predict a difference between two => samples or within a sample before and after an intervention (=> Inferential statistics). |
Distribution measures | Scatter measures indicate the deviation of the scores from the “centre”. The most important distribution measures are interquartile range, interdecile range, the => standard deviation , the => variance and the coefficient of variation (translated from: Ilmes, Streuungsmasse) |
Document analysis | Document analysis is a form of => content analysis. It is a non-reactive method of data collection, which means that the collected data does not constitute a reaction to the questions being put by the evaluator. In the course of a document analysis, various sources of information, such as specialised literature or other texts, are used. These sources provide the type of information which is relevant to a particular question. |
Effectiveness | Effectiveness answers the question of whether the users can achieve the goals they have set by using the software (Grötsch & Anft, 2005) (=> Usability). |
Efficiency | Efficiency refers to the effort required from the user to complete a task (Grötsch & Anft, 2005) (=> Usability). |
Evaluation | Evaluation is the systematic and purposeful collection, analysis and appraisal of data for quality assurance and quality control. It applies to the assessment of planning, development, design and use of educational courses or individual elements of these courses (methods, media, programs, parts of programs) under the aspects of quality, functionality, effect, efficiency and utility. (translated from Tergan, 2000) |
Evaluation method | The means by which a => program or => project is evaluated. These are empirical research methods from social sciences as well as statistical approaches to data collection and analysis, which should be suitable for the evaluation aims and objects under study. For => formative evaluations, the design tends to include the use qualitative methods, whereas in => summative evaluations, the emphasis is on quantitative methods , though the boundaries are fluid. (translated from: Glossar wirkungsorientierte Evaluation, Evaluationsmethode) |
Evaluation object, Object of evaluation | Evaluand; object that is being evaluated, whether that object is a => project or => program. Evaluation process The logical procedure involving planning, executing, and analysing an => evaluation. |
Evaluation question(s) | A specific question to be answered by means of an => evaluation. The questions aim to provide information on the => object of the evaluation for the client (and other parties involved). The questions define the aspects of the program about which data and input are to be obtained. The evaluation plan is to be precisely tailored to these questions. An evaluation question is not the same as a question in a => questionnaire or in an => interview schedule (translated from: Glossar wirkungsorientierte Evaluation, Evaluationsfragestellungen) |
Experimental designs | In experimental designs, the participants are randomly assigned to an intervention group or to a control group (and both groups are therefore equivalent). Also a specific ‘treatment’ is introduced, i.e. the intervention group receives the intervention to be analysed (e.g. training using an e-learning tool) whereas the control group does not (or receives an appropriate control treatment). Furthermore, the target value (variable in question, e.g. learning performance) is measured in both groups, if necessary, before the intervention, and always after the intervention (=> Study design). |
Expert-based methods (Heuristic method) | Examination and evaluation of graphical user interfaces according to pre-set criteria (heuristics). |
External evaluation | => Evaluation conducted by an evaluator from outside the organisation within which the => object of the evaluation is based. |
Feasibility study | An objective study measuring the strength of its case including the study findings, recommendations, and (if the goal is feasible) a plan, timetable, and budget (From: Nonprofit Good Practice Guide, Feasibility Study) (=> Concept level). |
Focus groups | A focus group is a moderated and focused group discussion accompanied by a specially trained moderator. The discussion should generate intense exposure to the topic selected. Participants in focus groups are expected to share their experiences, ideas, and observations about a selected subject. |
Formative evaluation | “A type of process evaluation of new programs or services that focuses on collecting data on program operations so that needed changes or modifications can be made to the program in the early stages. This type of (=>) evaluation is carried out while a (=>) project or (=>) program is implemented in order to provide timely, continuous feedback as work progresses”. (From: Nonprofit Good Practice Guide, Formative Evaluation) |
Frequency distribution | “A frequency distribution shows the number of observations falling into each of several ranges of values. Frequency distributions are portrayed as frequency tables, histograms, or polygons”. (From: HyperStat Online, Frequency Distributions) |
Functional goals | In the => goal hierarchy, => functional goals make a direct contribution to the implementation of a => mediator goal and an indirect contribution to the achievement of the => policy goal. They serve as an orientation on a practical level and their formulation leaves little scope for variation, by contrast to the higher-level classes of goals. |
Goal-based evaluation | The aim of goal-based evaluation is to investigate whether the => project has achieved its goals. This question is posed at the end of the project process, frequently within the context of a => summative evaluation. |
Goal hierarchies | Goals can be formulated on various levels of abstraction, e.g. as => policy goals, => mediator goals and => functional goals (cf. Beywl & Schepp-Winter, 1999) (=> Concept level). |
Impact level | At this level of => program evaluation, the focus is on the question regarding the effect the program has had and whether it is sustainable. |
Inferential statistics | “Statistics from which an inference is made about the nature of a (=>) population; the purpose is to generalise about the population, based upon data from the (=>) sample selected from the population.” (From: Biology-Online.org, Inferential Statistics) (=> Statistical analysis methods) |
Informal tests | Non-standardised tests to which => quality criteria are not relevant. They are developed by the evaluators on an ad hoc basis to meet the requirements of the respective evaluation goals. |
Internal evaluation | An evaluation performed by a staff member or unit from the organisation within which the => object of the evaluation is based. |
Interval scale | In an interval scale, the numbers represent equal increments of the attribute being measured, but there is no "real" zero point (=> Scale level). |
Interview | The interview is a prominent method of recording qualitative data in social sciences. It is an oral survey of people about a specific subject. The interview is frequently used if too little is known about the variables to be studied, or if it is necessary to delve into certain aspects by probing. Similar to => questionnaire, an interview also comprises various items (questions). Similarly again, there are different degrees of standardisation. |
Learning level | At this level, attention is focused on what is supposed to be achieved with the product of a project, and, in the context of e-learning projects, the products are usually specific learning processes or learning results (=> Project evaluation). |
Learning results | Learning results may relate to subject-related contents (knowledge and understanding, attitudes and abilities), but they may also relate to developing skills unrelated to the subject (e.g. media skills) (=> Learning level). |
Learning processes | The learning processes taking place in technology-based learning can be interaction with learning contents (cognitive processes activated while working with the material), interaction with technology (processes involved while using a technical system) and interaction with other persons (communicative processes in group learning) (cf. Friedrich, Hron & Hesse, 2001) (=> Learning level). |
Logic models | A logic model describes how the parts of a => project or => program contribute to goal achievement and how they interrelate. The model usually consists of the components “resources”, “procedures”, “products”, “outcomes and impacts” and “factors” (=> Concept level) |
Longitudinal designs | The same investigation is conducted at several different times (e.g. examination of computer-related attitudes in the first-term students over several years to collect data about the change of the attitude in different ‘cohorts’) (=> Non-experimental designs). |
Mann-Whitney test (U-Test) | If two independent (=>) samples of (=>) ordinal data are to be compared with respect to their central tendency (e.g. Do the students in one class achieve better scores than those in another class?), then the Mann-Whitney U-Test is used. |
Mean value | “The long-term average of occurrences; also called the expected value” (From: Statistical Glossary, Mean Value) |
Median | A median divides the distribution into two halves, with 50% of the scores above it and 50% of the scores below it (Measure of => central tendency). |
Mediator goals | In the => goal hierarchy, mediator goals specify components of the => policy goal and make it understandable why the pursuit of certain activity goals is reasonable to achieve the policy goal. |
Mode | A mode is the most frequently occurring score (Measure of => central tendency). |
Needs analysis | Needs analysis is the analysis of the discrepancy between the current and the target status. The target status is defined either by the project team or jointly with the target group (=> Concept level). |
Nominal scale | In a nominal scale, different numbers do not represent anything more than different attributes. Which attribute is assigned to which number is not important, as the numbers do not stand for "more" or "less", "greater" or "smaller" (=> Scale level). |
Non-experimental designs | In non-experimental designs, the researcher studies a phenomenon without manipulating the independent variable(s). As there are no interventions, non-experimental designs cannot predict causal relationships (=> Study design). |
Non-probability sampling | In non-probability sampling, units are selected according to certain characteristics. Non-probability sampling techniques cannot be used to infer from the => sample to the general => population by applying statistical models. |
Non-response rate | “In sample surveys, the failure to obtain information from a designated individual for any reason (… absence or refusal to reply) is often called a non-response and the proportion of such individuals of the (=>) sample aimed at is called the non-response rate.” (From: Glossary of Statistical Terms, Non-Response Rate) |
Non-standardised interviews | In norm-oriented tests, the individual test performance is compared with the average performance of a reference group (=> Standardised tests). |
Norm-oriented tests | “Objectivity in science is the property of scientific measurement that can be tested independently from the individual scientist (the subject) who proposes them” (From: Wikipedia, Objectivity (science)) (=> Quality criteria). |
Observations | Observations provide information that is not reported by the study participants themselves, but is collected directly by a trained person. The information gained from observations can be in the form of notes and records as well as audio or video recordings. Observations can be used to collect data on the procedures of handling the software as well as on the problems and learning difficulties experienced by the users in the operation. |
Ordinal scale | In an ordinal scale, the numbers represent a rank order; however the scale does not provide any information as to the relations of the attributes in the rank order. The same intervals between the values do not therefore represent the same intervals as "in reality" (=> Scale level). |
Output level | At this level, the focus of the => evaluation is on the products of the program or of the individual => projects (=> Program evaluation). |
Outside evaluation | “An evaluation performed by an evaluator not affiliated with the agency prior to the (=>) program (or => project) evaluation. Also known as a third-party evaluator”. (From: Nonprofit Good Practice Guide, Outside Evaluation) |
Pearson's product-moment correlation coefficient | The correlation between => interval or => ratio data (e.g. Is the time taken to solve a task of a task associated with age?) is calculated using Pearson's product-moment correlation coefficient (=> Correlational hypotheses) |
Percentile | Difference between the value that falls above the x % of the scores and that which falls below the x % of the scores (=> Distribution measure) |
Policy goals | Policy goals are on the highest level of abstraction of the => goal hierarchy. They are not very concrete but remain constant over time and set the basic direction of the => program or => project. |
Population | The complete set of objects based on which the conclusions of a study have been drawn. |
Portfolio level | At this level, the focus of the evaluation is on the request and selection of => projects (=> Program evaluation). |
Process level | At this level, the focus of the evaluation is on the implementation of the => projects or of the accompanying program measures (=> Program evaluation). |
Product level | The focus of this level is on the product characteristics of the => project or which characteristics the product should include. A product can be, for example, learning software or a course (=> Project evaluation). |
Program | A program denotes a collection of related => projects that pursue a common strategic aim (translated from: Projektmagazin, Programm) |
Program evaluation | The => evaluation of an initiative that is implemented through several individual => projects. |
Program process | The program process structures the different phases of a => program from planning over mobilization and implementation to integration of results. |
Project | “A project is a temporary endeavour undertaken to create a product or service. Projects are characterised by progressive elaboration: because of uniqueness and relatively high levels of uncertainty, projects cannot be understood entirely at or before the project start, and therefore the planning and execution of projects often happens in separate steps or phases. As the project progresses, the project team understands the next steps, the deliverables and the method of execution much better. Based on this knowledge, team members can draw up initial draft plans, and execute the next phase of the project based on these detailed plans” (From: Wikipedia, Project). |
Project evaluation | The evaluation of an individual => project. |
Project process | The project process structures the different phases of a => project, from project planning through development and the pilot phase to implementation. |
Qualitative content analysis | The aim of qualitative content analysis is not to quantify but to interpret text contents, i.e. the content meaning of statements is identified without reducing the material to quantifiable statements. This approach allows three different aims to be pursued: Summary, explication and structuring (=> Content analysis). |
Quasi-experimental designs | Quasi-experiments differ from experiments in that the participants are not assigned randomly to the intervention and control group, i.e. naturally existing, non-randomised groups are investigated (e.g. two different school classes). Because the assignment is not randomised, it cannot be ruled out that both groups differ in variables that have an effect on the target value in question. Thus, when interpreting possible differences between the groups, one cannot simply assume that the difference can ascribed to the ‘treatment’ (the intervention to be evaluated) (=> Study design) |
Questionnaire | A written survey of people on a specific subject. It consists of several questions or items to be answered. Occasionally, more than one question is asked about the same subjected (what is known as a scale, e.g. on the acceptance of an e-learning program). |
Quota sampling | In quota sampling the distribution of certain characteristics in the => sample corresponds exactly to the distribution of these characteristics in the => population, although familiariy with this distribution is a prerequisite (=> Non-probability sampling). |
Random sampling (random selection) | In a random sample, the => population should be known and defined exactly. Each member should be represented in the population uniquely and thus have an equal and calculable probability of selection (=> Sample). |
Range of variation | Range of variation as a => distribution measure is the difference between the smallest and the biggest score. |
Ratio scale | In a ratio scale, the ratio of the intervals between the attributes is relevant. By contrast to => interval scale, the scale has a meaningfully interpretable zero point (=> Scale level). |
Reaction level | The focus of this level is on the reaction of the target group towards the product of the => project (=> Project evaluation). |
Relative frequency | “Relative frequency is another term for proportion; it is the value calculated by dividing the number of times an event occurs by the total number of times an experiment is carried out” (From: Statistics Glossary, Relative Frequency). |
Reliability | “The extent to which a measurement instrument yields consistent, stable, and uniform results over repeated (=>) observations or measurements under the same conditions each time” (=> Quality criteria) (From: Program Evaluation Glossary, Reliability). |
Sample | “A sample is a subset of a (=>) population, often taken for the purpose of statistical inference” (From: Connexions, Populations and samples). |
Sample size | The number of units chosen from a population (=> Sample). |
Scale level | A scale level indicates how the numbers are to be interpreted and thus which operations are appropriate for these numbers. A distinction is made between four scale levels : => Nominal, => ordinal, => interval and => ratio scale |
Self evaluation | In a self-evaluation, those responsible for the => program or => project are also responsible for its evaluation. This means that the => evaluation is conducted by the same people who are also responsible for designing or implementing the => object of evaluation; their own professional work is the subject to evaluation. |
Semi-standardised interviews | In semi-standardised interviews, the way the discussion should be held is specified by a relatively strict set of interview guidelines. Both closed and open questions can be posed (=> Interview). |
Simple random sampling | Each member of the => population has an equal chance of being selected. It allows the maximum possible representativeness, i. e. the => sample reflects the composition of the population (=> Random sampling). |
Situational tests | Situational tests are sample work, role plays, case studies, group discussions, planning games, and simulations. These tests provide data on procedures regarding the accomplishment of close-to-reality tasks. |
S.M.A.R.T. objectives | => Functional goals should be formulated to be as "smart" as possible, i.e. Specific, Measurable, Appropriate, Realistic, Timed (=> Concept level). |
Spearman's ranking correlation coefficient | The correlation of two => ordinal-scaled attributes (e.g. Is the popularity of pupils associated with school grades?) is captured with Spearman's ranking correlation coefficient (=> Correlational hypotheses). |
Stakeholder | A group of people that are or should be involved in, or that are affected by the => evaluation. |
Standard deviation | The square root of the variance. Only for => interval-scaled and => ratio-scaled data! |
Standardised interviews | The way of conducting standardised interviews is precisely pre-defined. Such => interviews are particularly suitable for clearly defined subject areas. |
Standardised tests | “A test administered and scored in a consistent manner. The (=>) tests are designed in such a way that the "questions, conditions for administration, scoring procedures and interpretation are consistent" and are "administered and scored in a predetermined, standard manner."” (From: Wikipedia, Standardised test). Standardised tests should meet the (=>) quality criteria. |
Statistical analysis methods | Statistical analysis methods can be used if you have collected quantitative (numerical) data using => questionnaires, => tests, => observations, etc. Statistical approaches serve to describe data e.g. in parameters or diagrams (=> descriptive statistics), as well as difference hypothesis testing (=> inferential statistics). |
Stratified sampling | Each member of the => population does not have an equal chance of selection but a calculable one. Units are selected from each stratum by => random selection. The resulting stratified sampling has the advantage that also units from ”small“ strata are adequately represented in the => sample. |
Study design | A study design describes which data should be collected from which objects, when and how often, and which measures are to be taken to eliminate possible biases which might affect the results. |
Summative evaluation | Summative evaluation allows the [subsequent] control of quality, effects and usefulness of an educational course. The question of interest is whether an educational course or individual components of the course can meet certain expectations in practice (translated from Tergan, 2000). |
t-test | In order to compare two independent sample means of => interval and => ratio data (e.g Do the students in one class need more time to solve a task than those in another class?), the t-test for independent (=>) samples is used; in a comparison of two dependent sample means (e.g. Do the students need more timeto solve a task before training than after training?) the t-test for dependent samples is used (=> Difference hypotheses). |
Test | The term "test" has several meanings: In relation to the evaluation of a learning program, tests can be understood as more or less standardised procedures for the measurement of the behaviour or performance of people. A test consists of various items which can be questions with open, semi-open or closed answers. |
Thinking aloud | Thinking aloud involves having the participants verbalise their thoughts while, for example, using specific software, doing an exercise, etc. In this way, for example, the usability of a software can be investigated by analysing in detail the problems that have arisen during the usage, or by investigating which learning and problem-solving strategies have been used to tackle the exercise (=> User-based methods). |
Transfer level | The question at this level of evaluation is whether the transfer of the learned material can be successfully put into practice (=> project evaluation). |
Triangulation | A combination of methodologies for researching the same phenomenon. |
Typical case sampling | In typical case sampling units that are considered as characteristic for the => population are selected (=> Non-probability sampling). |
Usability | Usability (also known as user-friendliness) is a product characteristic which defines how easily, for example, a learning software can be used. The core of usability is made up of the criteria of => effectiveness, => efficiency and the level of => satisfaction (cf. requirements of usability, DIN EN ISO 9241-11; Heinsen & Vogt, 2003). |
Usability interview | In this approach, the users work with the program and inform the test supervisor in an => interview about their impressions and opinions regarding the program after the event (=> User-based methods). |
User-based methods | These are methods for documenting user-friendliness which involve having the product (for example, the e-learning program) tested directly by the users. They work on a task typical for the program without any assistance. The interaction between the user and the program is recorded and subsequently studied. |
User satisfaction | User satisfaction reflects the perceived user friendliness of the product (Grötsch & Anft, 2005) (=> Usability). |
Validity | “The extent to which a measurement instrument or test accurately measures what it is supposed to measure” (=> Quality criteria) (From: Program Evaluation Glossary, Validity). |
Variance | Variance as a => distribution measure is the sum of squared deviations of each number from its mean, divided by the number of scores. |
Wilcoxon test | If two dependent (=>) samples are to be compared (e.g. Do the students obtain better scores after training compared to before the training?), then the Wilcoxon-test is used (=> Difference hypotheses). |
Further glossaries of methods of empirical social research, evaluation and statistical terms can be found here:
http://www.epa.gov/evaluate/glossary/a-esd.htm
http://www.npgoodpractice.org/Glossary
http://cob.ualr.edu/smartstat/glossary/gloss.html#let_A
http://davidmlane.com/hyperstat
http://stats.oecd.org/glossary
http://www.fbinnovation.de/en/glossary
http://cc.ysu.edu/~eeusip/glossary.htm#sectA