suggested readings for an introduction to educational measurement and test theory
Angoff, W. H., & Cook, L. L. (1988). Equating the scores of the Prueba de Aptitud Academica and the Scholastic Aptitude Test. College Board Report, No. 88-2, New York, NewYork.
Anonymous (XXXX). Rubric scoring and item writing.
Anonymous (2000). Occupational analysis - Residential electrician.
Campbell, D. T. & Fiske, D. W. (1959) Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.
Cizek, G. J., Bunch, M. B., & Koons, H. (2004). Setting performance standards: Contemporary Methods. Educational Measurement: Issues and Practice, 23(4), 31-50.
Cronbach, L.J. & Meehl, P.C. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302.
Dudek, F. J. (1979). The continuing misinterpretation of the standard error of measurement. Psychological Bulletin, 86, 335-337.
Frary R.B. (1989). Partial-credit scoring methods for multiple-choice tests. Applied Measurement in Education, 2(1), 79-96.
Kolen, M. (2004). Linking assessments: Concept and history. Applied Psychological Measurement, 28, 219-226.
Lissitz, R. W. & Bourque, M. L. (1995). Reporting NAEP results using standards. Educational Measurement: Issues and Practice, 14(2), 14-23, 31.
Lord, F. M. (1953). On the Statistical Treatment of Football Numbers, American Psychologist, 8, 750-751.
Messick, S. (1995). Validity of psychological assessment. Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741-749.
Millman, J., & Greene, J. (1989). The specification and development of tests of achievement and ability. In R.L. Linn (Ed.), Educational Measurement (3rd Ed.) (pp 335-366). New York: American Council on Education.
Mills, C. N. (1999). Development and introduction of a computer adaptive Graduate Record Examination General Test. In F. Drasgow & J .B. Olson-Buchanan (Eds.). Innovations in Computerized Assessment (pp. 117-135). Mahwah NJ: Erlbaum.
Mislevy, R.J. (2003). Substance and structure in assessment arguments. Law, Probability, and Risk, 2, 237-258.
Moss, P. A. (1998). The role of consequences in validity theory. Educational Measurement: Issues and Practice, 17(2), 6-12.
Ohio Department of Education (1987). Ohio Testing Handbook Two: A Guide for Establishing an Effective School Testing Program. Columbus, OH: Ohio Department of Education.
Reckase, M. D. (1998). Consequential validity from the test developer's perspective. Educational Measurement: Issues and Practice, 17(4), 13-16.
Skaggs, G. & Lissitz, R. W. (1992). The consistency of detecting item bias across independent samples: Implications of another failure. Journal of Educational Measurement, 29, 227-242.
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 667-80.
Weinberg, S. (1991). An introduction to multidimensional scaling. Measurement and Evaluation in Counseling and Development, 24, 12-36.