Evaluation of Open-ended Question’s Answers using Large Language Model (LLM): A Case Study of a Language Learning Center in University

Faisal Wilmar; Panca Oktavia Hadi Putra

Evaluation of Open-ended Question’s Answers using Large Language Model (LLM): A Case Study of a Language Learning Center in University

Authors

Faisal Wilmar Universitas Indonesia https://orcid.org/0009-0008-0988-7601
Panca Oktavia Hadi Putra Universitas Indonesia

Keywords:

MOOC, generative ai, artificial intelligence, large language model, prompt engineering technique, open-ended question, English course

Abstract

Massive open online courses (MOOCs) are a transformative tool in education. Their benefits are increasingly significant along with the development of artificial intelligence (AI) in the form of Large Language Models (LLMs). However, the use of LLMs in education raises several ethical issues, such as accuracy and fairness, especially when LLMs are used for assessment or evaluation. This is because of the risk of bias, inaccuracy, and inconsistency in the AI output. These risks can be reduced with the development of LLM capabilities and the use of prompt engineering techniques, namely, a way of communicating with LLM models. This study aims to evaluate the influence of factors affecting the accuracy of evaluation results from open-ended questions in an English MOOC at a university’s language learning center. The results of this evaluation are used to determine recommendations for the consistent and accurate use of LLMs in evaluating open-ended questions in English courses using the MOOC platform. This quantitative quasi-experimental research involved 580 participants divided into three proficiency groups who answered open-ended questions. Participants’ answers were evaluated by one human rater and three LLMs using three prompt engineering techniques to generate an evaluation result. The assessment results were analyzed using a three-way analysis of variance (ANOVA) to examine the factors influencing the LLM output. The mean absolute difference (MAE) between the human and LLM assessment results was used to assess the accuracy of LLM. The evaluation result was then calculated using the quadratic weighted kappa to generate inter-rater reliability to assess the raters’ consistency. The results showed that the participant group, LLM model used, and prompt engineering technique used influenced the assessment results. The ability level of the evaluated participants had the greatest impact on the results. A combination of LLM and prompt engineering technique, ChatGPT-4.1 with Chain-of-Thought, provided the best results, but it should not be used for high-stakes assessments. LLM can be used for initial assessment or as a complement to human assessment results without replacing actual human raters for high-stakes assessments.

Downloads

Download data is not yet available.

Downloads

Faisal

Published

2026-05-01

How to Cite

Faisal Wilmar, & Panca Oktavia Hadi Putra. (2026). Evaluation of Open-ended Question’s Answers using Large Language Model (LLM): A Case Study of a Language Learning Center in University. Jurnal Sistem Informasi, 22(1). Retrieved from https://jsi.cs.ui.ac.id/index.php/jsi/article/view/1550

Download Citation

Issue

Vol. 22 No. 1 (2026): Vol. 22 No. 1 (2026): Jurnal Sistem Informasi (Journal of Information System)

Section

Articles

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Evaluation of Open-ended Question’s Answers using Large Language Model (LLM): A Case Study of a Language Learning Center in University

Authors

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

people_policies

PEOPLE

POLICIES

jsiblock

Our journal is implementing Double Blind Review for each manuscript submitted in English. There are no charges for submitted or published articles in our journal. The submitted manuscript in this journal is screened for plagiarism using iThenticate.

To order the journal, please send your quotation to jsi@cs.ui.ac.id, the payment can be done through wired transfer.

Visitor