Publications Using
Public Datasets

Published or pending work that has used ASSISTments Data,

in which Neil Heffernan was not an author.

On this page there are 6 publicly available data sets folks have used with the first 3 being the popular datasets:

  1. ASSISTments 2009-10 (dataset) we call ASSIST2009-10. (19+ papers below)

  2. ASSISTments 2012-13 (dataset) we call ASSIST2012-13. (10+ papers below)

  3. ASSISTments 2015-16 (dataset) we call ASSIST2015-16. (7+ papers below)

  4. Predicting students' future careers (70 people competed in this competition) PREDICT-STEM-CAREER.

  5. Dataset on 22 experiments (dataset and paper1 and paper2) we call 22Experiments data set. (1+ paper below)

  6. Early data sets are stored at the PSLC (https://pslcdatashop.web.cmu.edu/). This includes ASSIST2004-05 (912 students), ASSIST2005-06 (3136 students), and ASSIST2006-07 (5046 students). (2+ papers below)

  7. Dr. Zach Pardos released a dataset referred to as G6.207, G7.233, G6.217. (2+ papers below)

  8. The TeachASSIST data from Patikorn & Heffernan (2020). (0 papers below)

Patikorn, T. & Heffernan, N. T. (2020, August 12) Effectiveness of Crowd-Sourcing On-Demand Tutoring from Teachers in Online Learning Platforms. Proceedings of the Seventh ACM Conference on Learning @ Scale (L@S). Pages 115–124. https://doi.org/10.1145/3386527.3405912. Best Student Paper Awardee. Video of talk

A46 Abidi, S.M.R., Hussain, M., Xu, Y., & Zhang, W. (2019). Prediction of Confusion Attempting Algebra Homework in an Intelligent Tutoring System through Machine Learning Techniques for Educational Sustainable Development. Sustainability, 11(1), 105; DOI: 10.3390/su11010105. They used the ASSIST2009-2010 dataset.


A45 Li, Z., Ren, C., Li, X., Pardos, Z. A. (2021). Learning Skill Equivalencies Across Platform Taxonomies. Learning Analytics and Knowledge (LAK). April 12-16, 2021, Irvine, CA, USA. ACM. DOI: 10.1145/3448139.3448173. They used the ASSIST2012-2013 dataset.


A44 Shen, S., Liu, Q., Chen, Wu ,Huang., Zhao, Su, Ma, Wang. (2020) Convolutional Knowledge Tracing: Modeling Individualization in Student Learning Process. SIGIR ’20, July 25–30, 2020, Virtual Event, China.


A43 Li, Z., Yee, L., Sauerberg, N., Sakson, I., Williams, J.J., Rafferty, A. (2020). "Getting too personal(ized): The importance of feature choice in online adaptive algorithms" In Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020), Anna N. Rafferty, Jacob Whitehill, Violetta Cavalli-Sforza, and Cristobal Romero (eds.) 2020, pp. 159 - 170. They used the ASSIST2015-16 dataset.


A42 Agarwal, D., Baker, R.S., Muraleedharan, A. (2020). "Dynamic knowledge tracing through data driven recency weights" In Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020), Anna N. Rafferty, Jacob Whitehill, Violetta Cavalli-Sforza, and Cristobal Romero (eds.) 2020, pp. 725 - 729. They used ASSISTments-G6_207, G7_233, and G6_217 datasets.


A41 Sonkar, S., Waters, A.E., Lan, A.S., Grimaldi, P.J., Baraniuk, R.G. (2020). "qDKT: Question-centric Deep Knowledge Tracing" In Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020), Anna N. Rafferty, Jacob Whitehill, Violetta Cavalli-Sforza, and Cristobal Romero (eds.) 2020, pp. 677 - 681. https://arxiv.org/pdf/2005.12442.pdf. They used the ASSIST2009-10 and ASSIST2017 datasets.


A40 Sergent, T., Bouchet, F., and Carron, T. (2020). "Towards Temporality-Sensitive Recurrent Neural Networks through Enriched Traces" In Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020), Anna N. Rafferty, Jacob Whitehill, Violetta Cavalli-Sforza, and Cristobal Romero (eds.) 2020, pp. 658 - 661.

https://educationaldatamining.org/files/conferences/EDM2020/papers/paper_242.pdf. They used the ASSIST2012-13 and ASSIST2017 datasets.


A39 Clavie, B., Gal, K. (2020). "Deep Embeddings of Contextual Assessment Data for Improving Performance Prediction." In Proceedings of The 13th International Conference on

Educational Data Mining (EDM 2020), Anna N. Rafferty, Jacob Whitehill, Violetta Cavalli-Sforza, and Cristobal Romero (eds.) 2020, pp. 374 - 380.

https://educationaldatamining.org/files/conferences/EDM2020/papers/paper_60.pdf. They used the ASSIST2009-10 and ASSISTChall datasets.


A38 Wang, Y., Kai, S., Baker, R.S. (2020). Early Detection of Wheel-Spinning in ASSISTments. In Artificial Intelligence in Education. AIED 2020. Lecture Notes in Computer Science, vol 12163. Springer, Cham. https://doi.org/10.1007/978-3-030-52237-7_46. They used the ASSIST2015-16 dataset.


A37 Pavlik, P. Jr., Eglington, L. & Harrell-Williams, L. M. (2020) Knowledge Tracing: A Constrained Framework for Learner Modeling. https://arxiv.org/pdf/2005.00869.pdf. They used the ASSIST2004-05 @ PSLC.


A36 Lu, Y., Wang, D., Meng, Q., & Chen, P. (2020) Towards Interpretable Deep Learning Models for Knowledge Tracing. https://arxiv.org/pdf/2005.06139.pdf. They used the ASSIST2009-10 data set.


A35 Xu, L. & Davenport M. (2020) Dynamic knowledge embedding and tracing. https://arxiv.org/pdf/2005.09109.pdf. They used the ASSISTments 2009-10 and the ASSIST2012-13 data.


A34 Rafferty, A., Ying, H., & Williams, J. (2019). Statistical Consequences of using Multi-armed Bandits to Conduct Adaptive Educational Experiments. JEDM | Journal of Educational Data Mining, 11(1), 47-79. Retrieved from https://jedm.educationaldatamining.org/index.php/JEDM/article/view/357 Used the 22 Experiments dataset 22Experiments .


A33 Yeung, C. K., & Yeung, D. Y. (2018). Addressing two problems in deep knowledge tracing via prediction-consistent regularization. In Proceedings of the Fifth Annual ACM Conference on Learning at Scale. page 5. ACM. Used the ASSIST 2009 dataset.


A32 Pandey, P, & Karypis, G. (2019). A Self Attentive model for Knowledge Tracing. Proceedings of the 12th International Conference on Educational Data Mining. Montreal, Canada. Pages 384-389. Used the ASSIST2009-10 and 2015 data sets.


A31 Choffin, B., Popineau, F., Bourda, Y., & Vie, J. (2019). DAS3H: Modeling Student Learning and Forgetting for Optimally Scheduling Distributed Practice of Skills. In Proceedings of the 12th International Conference on Educational Data Mining. Montreal, Canada. Pages 29-38. Won "Best Paper" at EDM. Used the ASSISTments 2012-2013 dataset. Dr. Heffernan is proud of the fact that the best paper at EDM2019 used ASSIStments Data. (Photo)


A30 Ding, X. & Larson, E. (2019). Why Deep Knowledge Tracing has less Depth than Anticipated. In Proceedings of the 12th International Conference on Educational Data Mining. Montreal, Canada. Pages 282-287. Uses the 2015 data set.


A29 Jiang, N., Pardos, Z. A. (2019). Binary Q-matrix Learning with dAFM. In Proceedings of the 12th International Conference on Educational Data Mining. Montreal, Canada. Pages 588-590. ASSIST2009-10 and 2012-13 data sets.


A28 Mandalapu, V., Gong, J. (2019). Studying Factors Influencing the Prediction of Student STEM and Non-STEM Career Choice. In Proceedings of the 12th International Conference on Educational Data Mining. Montreal, Canada. Pages 607-610. Uses the "Predicting STEM Career Dataset" called PREDICT-STEM-CAREER


A27 Wang, T., Ma, F., Gao, J. (2019). Deep Hierarchical Knowledge Tracing. In Proceedings of the 12th International Conference on Educational Data Mining. Montreal, Canada. Pages 667-670. Uses ASSIST2009-10 2009-10 and 2012-13 data sets.


A26 Wang, S. (2019). Improving Computer-Assisted Language Learning through Hierarchical Knowledge Structures. Cornell University, ProQuest Dissertations Publishing, 2019. 13866099.


A25 Lalwani A., Agrawal S. (2019). What Does Time Tell? Tracing the Forgetting Curve Using Deep Knowledge Tracing. In Artificial Intelligence in Education AIED 2019. Lecture Notes in Computer Science, vol 11626. Springer, Cham. Pages 158-162. ASSIST2012-13


A24 Montero, S., Arora, A., Kelly, S., Milne, B., Mozer, M. (2018). Does Deep Knowledge Tracing Model Interactions Among Skills? The 11th International Conference on Educational Data Mining, EDM 2018. Used the cleaned data from ASSIST2009-10.

A23 Ritwick Chaudhry, Harvineet Singh, Pradeep Doggay, Shiv Kumar Saini (2018). Modeling Hint-Taking Behavior and Knowledge State of Students with Multi-Task Learning. The 11th International Conference on Educational Data Mining. EDM 2018. Uses the ASSIST2009-10 data set.

A22 Liu, R., Walker, E. &, Solovey, E. (2017). Toward Neuroadaptive Personal Learning Environments. In The First Biannual Neuroadaptive Technology Conference. Retrieved from http://neuroadaptive.org/files/NAT17_Berlin_Conference_Programme.pdf#page=63 . They used assistments to collect their own data connected to some brain scanning data. They did not release their data set.

A21 Song Y., Cai H., Zheng X., Qiu Q., Jin Y., Zhao X. (2017). FTGWS: Forming Optimal Tutor Group for Weak Students Discovered in Educational Settings. In: Benslimane D., Damiani E., Grosky W., Hameurlain A., Sheth A., Wagner R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science, vol 10438. Springer, Cham. DOI: https://doi.org/10.1007/978-3-319-64468-4_33 . They used the ASSIST2012-13 affect data.

A20 Song Y., Jin Y., Zheng X., Han H., Zhong Y., Zhao X. (2015). PSFK: A Student Performance Prediction Scheme for First-Encounter Knowledge in ITS. In: Zhang S., Wirsing M., Zhang Z. (eds) Knowledge Science, Engineering and Management. Lecture Notes in Computer Science, vol 9403. Springer, Cham. pp 639-650. https://doi.org/10.1007/978-3-319-25159-2_58. Author Copy They used ASSIST2009-10 dataset.

A19 Sha L. and Hong P. (2017). Neural Knowledge Tracing. In: Frasson C., Kostopoulos G. (eds) Brain Function Assessment in Learning. BFAL 2017. Lecture Notes in Computer Science, vol 10512. Springer, Cham. 108- Used the ASSIST2009-10 dataset.

A18 Pardos, Z.A., Dadu, A. (2017). Imputing KCs with Representations of Problem Content and Context. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization (UMAP'17). Bratislava, Slovakia. ACM. Pp. 148-155. http://dl.acm.org/authorize?N31523 They used the ASSIST2012-13 data set.

A17 Rochelle, J., Feng, M., Murphy, R., Mason, C. & Fairman, J. (2017). Rigor and Relevance in an Efficacy Study of an Online Mathematics Homework Intervention Intervention. Paper presented at The Society for Research on Educational Effectiveness Spring Conference. Presented March 2nd 2017. Slides. Feng used data that was never released.

A16 Zhang, J., Shi, X., King, I., & Yeung, D. (2016). Dynamic Key-Value Memory Network for Knowledge Tracing. Retrieved from https://arxiv.org/pdf/1611.08108.pdf They used the ASSIST2009-10 and 2015 data sets.

A15 Rochelle, J., Feng, M., Murphy, R. & Mason, C. (2016). Online Mathematics Homework Increases Student Achievement. AERA OPEN. October-December 2016, Vol. 2, No. 4, pp. 1–12. DOI: 10.1177/2332858416673968

A14 Feng, M. & Roschelle, J. (2016). Predicting Students' Standardized Test Scores Using Online Homework. L@S 2016: 213-216 Feng did not make the data set public.

A13 Khajah, M., Lindsey, R., & Mozer, M. (2016). How Deep is Knowledge Tracing? In Barnes, Chi & Feng (eds) The 9th International Conference on Educational Data Mining. pp 94-101.Used the ASSIST2009-10 dataset.

A12 Wilson, K., Karklin, Y., Han, B., & Ekanadham, C. (2016) Back to the basics: Bayesian extensions of IRT outperform neural networks for proficiency estimation. In Barnes, Chi & Feng (eds) The 9th International Conference on Educational Data Mining. pp 539-544. I think used the ASSSIT2009-10 data set.

A11 Xiong, X., Zhao, S., Vaninwegen, E. & Beck, J. (2016) Going Deeper with Deep Knowledge Tracing. In Barnes, Chi & Feng (eds) The 9th International Conference on Educational Data Mining. pp 94-101 Datasets for ASSIST2009-10 and 20014-15

A10 Feng, M., Roschelle, J., Mason, C. & Bhanot, R. (2016) Investigating Gender Difference on Homework in Middle School Mathematics. In Barnes, Chi & Feng (eds) The 9th International Conference on Educational Data Mining. pp 364-369. Feng did not make the data public.

A9 Xing, W., & Goggins, S. (2015, March). Learning analytics in outer space: a Hidden Naïve Bayes model for automatic student off-task behavior detection. In Proceedings of the Fifth International Conference on Learning Analytics And Knowledge (pp. 176-183). ACM. Free version. I think uses Ryan Bakers ground truth affect judgments/

A8 Wan, H. & Beck, J. (2015) Considering the influence of prerequisite performance on wheel spinning. In Romero, C. and Pechenizkiy, M. (eds.) Proceedings of the 8th International Conference Educational Data Mining. Madrid, Spain. They don't seem to have released a data set and refer to a 2010-11 dataset.

A7 Tang, S., Gogel, H., McBride, E., Pardos, Z.A. (2015) Desirable Difficulty and Other Predictors of Effective Item Orderings. In Romero, C. and Pechenizkiy, M. (eds.) Proceedings of the 8th International Conference on Educational Data Mining. Madrid, Spain. Pages 416-419. Refer to a 2012-13 data set in 1 but that paper does not release a data set but refers to a 2006 GLOPS data set that Neil things Pardos never released.

A6 Piech, C., Spencer, J., Huang, J., Ganguli, S., Sahami, M., Guibas, L. & Sohl-Dickstein, J. (2015) Deep Knowledge Tracing. Neural Information Processing Systems (NIPS) 2015 Retrieved from http://arxiv.org/pdf/1506.05908.pdf Used ASSIST2009-10

A5 Tan, Ling, Sun, Xiaoxun, & Kho, Siek Toon (2014). Can Engagement be Compared? Measuring Academic Engagement for Comparison In Stamper, J., Pardos, Z., Mavrikis, M., McLaren, B.M. (eds.) Proceedings of the 7th International Conference on Educational Data Mining. pp. 213-216. Uses a 2005-06 data set stored at the PSLC at

https://pslcdatashop.web.cmu.edu/. There are 2004-05 (912 students), 2005-06(3136 students), and 2006-07(5046 students)

A4 Galyardt, A. & Goldin, I. (2014). Recent-Performance Factors Analysis. In Stamper, J., Pardos, Z., Mavrikis, M., McLaren, B.M. (eds.) Proceedings of the 7th International Conference on Educational Data Mining. pp. 411-412 [pdf]

A3 Feng, M. (2014)Towards Uncovering the Mysterious World of Math Homework. Proceedings of the 7th International Conference on Educational Data Mining. EDM 2014. pp 425-426. Appears to use a data set that was never released.

A2 Schultz, S. & Arroyo, I. (2014). Tracing Knowledge and Engagement in Parallel in an Intelligent Tutoring System. In Stamper, J., Pardos, Z., Mavrikis, M., McLaren, B.M. (eds.) Proceedings of the 7th International Conference on Educational Data Mining. pp. 312-315. Used ASSISTments-2009-10 data set.

A1 Pardos, Z, Wang, Q, & Trivedi, S. (2012) The real world significance of performance prediction EDM 2012: 192-195. They used ASSISTment-2009-10 dataset.