Evaluation Scales with Item Response Theory
This page contains code and data for our IRT analyses.
If you use the following data, please cite:
- J.P. Lalor, H. Wu, H. Yu, Building an Evaluation Scale using Item Response Theory, In EMNLP 2016. arXiv pre-print
The dataset consists of response patterns collected using the Amazon Mechanical Turk crowdsourcing platform.
Included in the zip file is the data and a README.
License: The data is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on the Stanford SNLI project.
Download: zip file
Code used to generate the evaluation scales from the paper was written in R. Included are R files for each of the 5 evaluation scales.
Download: code hosted on GitHub
If you use the following data please cite:
- J.P. Lalor, H. Wu, T. Munkhdalai, H. Yu. Understanding Deep Learning Performance through an Examination of Test Set Difficulty: A Psychometric Case Study. In EMNLP 2018. arxiv pre-print
Questions about the code or data? Contact me at lalor at cs dot umass dot edu.