test theory

suggested readings for an introduction to educational measurement and test theory

    1. Angoff, W. H., & Cook, L. L. (1988). Equating the scores of the Prueba de Aptitud Academica and the Scholastic Aptitude Test. College Board Report, No. 88-2, New York, NewYork.

    2. Anonymous (XXXX). Rubric scoring and item writing.

    3. Anonymous (2000). Occupational analysis - Residential electrician.

    4. Campbell, D. T. & Fiske, D. W. (1959) Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.

    5. Cizek, G. J., Bunch, M. B., & Koons, H. (2004). Setting performance standards: Contemporary Methods. Educational Measurement: Issues and Practice, 23(4), 31-50.

    6. Cronbach, L.J. & Meehl, P.C. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302.

    7. Dudek, F. J. (1979). The continuing misinterpretation of the standard error of measurement. Psychological Bulletin, 86, 335-337.

    8. Frary R.B. (1989). Partial-credit scoring methods for multiple-choice tests. Applied Measurement in Education, 2(1), 79-96.

    9. Kolen, M. (2004). Linking assessments: Concept and history. Applied Psychological Measurement, 28, 219-226.

    10. Lissitz, R. W. & Bourque, M. L. (1995). Reporting NAEP results using standards. Educational Measurement: Issues and Practice, 14(2), 14-23, 31.

    11. Lord, F. M. (1953). On the Statistical Treatment of Football Numbers, American Psychologist, 8, 750-751.

    12. Messick, S. (1995). Validity of psychological assessment. Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741-749.

    13. Millman, J., & Greene, J. (1989). The specification and development of tests of achievement and ability. In R.L. Linn (Ed.), Educational Measurement (3rd Ed.) (pp 335-366). New York: American Council on Education.

    14. Mills, C. N. (1999). Development and introduction of a computer adaptive Graduate Record Examination General Test. In F. Drasgow & J .B. Olson-Buchanan (Eds.). Innovations in Computerized Assessment (pp. 117-135). Mahwah NJ: Erlbaum.

    15. Mislevy, R.J. (2003). Substance and structure in assessment arguments. Law, Probability, and Risk, 2, 237-258.

    16. Moss, P. A. (1998). The role of consequences in validity theory. Educational Measurement: Issues and Practice, 17(2), 6-12.

    17. Ohio Department of Education (1987). Ohio Testing Handbook Two: A Guide for Establishing an Effective School Testing Program. Columbus, OH: Ohio Department of Education.

    18. Reckase, M. D. (1998). Consequential validity from the test developer's perspective. Educational Measurement: Issues and Practice, 17(4), 13-16.

    19. Skaggs, G. & Lissitz, R. W. (1992). The consistency of detecting item bias across independent samples: Implications of another failure. Journal of Educational Measurement, 29, 227-242.

    20. Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 667-80.

    21. Weinberg, S. (1991). An introduction to multidimensional scaling. Measurement and Evaluation in Counseling and Development, 24, 12-36.