Evaluation Scales with Item Response Theory


This page contains code and data for our IRT analyses.


If you use the following data, please cite:


The dataset consists of response patterns collected using the Amazon Mechanical Turk crowdsourcing platform.

Included in the zip file is the data and a README.

License: The data is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on the Stanford SNLI project.

Download: zip file


Code used to generate the evaluation scales from the paper was written in R. Included are R files for each of the 5 evaluation scales.

Download: code hosted on GitHub

Sentiment Analysis

If you use the following data please cite:


download data

Questions about the code or data? Contact me at lalor at cs dot umass dot edu.