Main Article Content

Abstract

This study aims to analyse the effectiveness of two prompt styles, namely guided prompt and free prompt, in influencing the quality of answers generated by a Retrieval-Augmented Generation (RAG)-based Large Language Model (LLM) system using the META-Llama 3 model. The system is designed to answer questions based on reference documents stored in vector form through an embedding process. The research was conducted using questions formed in two versions of the prompt style, and the answer results were evaluated using two metrics ROUGE and BERTScore. The results showed that guided prompts resulted in higher scores on ROUGE-1, ROUGE-2, and ROUGE-L metrics reflecting a better level of precision and lexical agreement. Meanwhile, the BERTScore between the two prompt styles did not show any significant difference, meaning that in terms of meaning or semantic similarity, they provided relatively equivalent results. These findings suggest that prompt design has a real impact on the structure and precision of answers.

Keywords

Large Language Model Retrieval-Augmented Generation ROUGE BERTScore

Article Details

References

    [1] A. Grattafiori et al., ‘The Llama 3 Herd of Models’, Nov. 23, 2024, doi: 10.48550/arXiv.2407.21783.

    [2] Y. Chang, X. Wang, J. Wang, Y. Wu, L. Yang, K. Zhu, H. Chen, X. Yi, C. Wang, Y. Wang, W. Ye, Y. Zhang, Y. Chang, P. S. Yu, Q. Yang, and X. Xie, ‘A Survey on Evaluation of Large Language Models’, ACM Trans. Intell. Syst. Technol., vol. 15, no. 3, pp. 1–45, Jun. 2024, doi: 10.1145/3641289.

    [3] Z. Jiang, F. F. Xu, L. Gao, Z. Sun, Q. Liu, J. Dwivedi-Yu, Y. Yang, J. Callan, and G. Neubig, ‘Active Retrieval Augmented Generation’, Oct. 22, 2023, doi: 10.48550/arXiv.2305.06983.

    [4] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, M. Wang, and H. Wang, ‘Retrieval-Augmented Generation for Large Language Models: A Survey’, Mar. 27, 2024, doi: 10.48550/arXiv.2312.10997.

    [5] H. Li, Y. Su, D. Cai, Y. Wang, and L. Liu, ‘A Survey on Retrieval-Augmented Text Generation’, Feb. 13, 2022, doi: 10.48550/arXiv.2202.01110.

    [6] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, ‘Chain-of-Thought Prompting Elicits Reasoning in Large Language Models’, Jan. 10, 2023, doi: 10.48550/arXiv.2201.11903.

    [7] Y. Han, C. Liu, and P. Wang, ‘A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge’, Oct. 18, 2023, doi: 10.48550/arXiv.2310.11703.

    [8] T. I. Z. M. Putra, S. Suprapto, and A. F. Bukhori, ‘Model Klasifikasi Berbasis Multiclass Classification dengan Kombinasi Indobert Embedding dan Long Short-Term Memory untuk Tweet Berbahasa Indonesia’, JISTED, vol. 1, no. 1, pp. 1–28, Nov. 2022, doi: 10.35912/jisted.v1i1.1509.

    [9] B. Auffarth, Generative AI with LangChain: Build Large Language Model (LLM) Apps with Python, ChatGPT, and Other LLMs, 2024 ed. Birmingham, UK: Packt Publishing Ltd, 2023.

    [10] O. Topsakal and T. C. Akinci, ‘Creating Large Language Model Applications Utilizing LangChain: A Primer on Developing LLM Apps Fast’, ICAENS, vol. 1, no. 1, pp. 1050–1056, Jul. 2023, doi: 10.59287/icaens.1127.

    [11] H. Tohir, N. Merlina, and M. Haris, ‘UTILIZING RETRIEVAL-AUGMENTED GENERATION IN LARGE LANGUAGE MODELS TO ENHANCE INDONESIAN LANGUAGE NLP’, jitk, vol. 10, no. 2, pp. 352–360, Nov. 2024, doi: 10.33480/jitk.v10i2.5916.

    [12] J. B. Gruber and M. Weber, ‘rollama: An R package for using generative large language models through Ollama’, Apr. 11, 2024, doi: 10.48550/arXiv.2404.07654.

    [13] T. Sun, J. He, X. Qiu, and X. Huang, ‘BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation’, in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, 2022, pp. 3726–3739. doi: 10.18653/v1/2022.emnlp-main.245.

    [14] Hanna, M., and Bojar, O., ‘A Fine-Grained Analysis of BERTScore’, Proceedings of the WMT 2021 - 6th Conference on Machine Translation, pp. 507–517, 2021. Available: https://aclanthology.org/2021.wmt-1.59. [Accessed: Feb. 06, 2025]

    [15] F. V. P. Samosir, H. Toba, and M. Ayub, ‘BESKlus : BERT Extractive Summarization with K-Means Clustering in Scientific Paper’, JuTISI, vol. 8, no. 1, Apr. 2022, doi: 10.28932/jutisi.v8i1.4474. Available: https://journal.maranatha.edu/index.php/jutisi/article/view/4474. [Accessed: Feb. 06, 2025]

    [16] M. Moradi, M. Dashti, and M. Samwald, ‘Summarization of biomedical articles using domain-specific word embeddings and graph ranking’, Journal of Biomedical Informatics, vol. 107, p. 103452, Jul. 2020, doi: 10.1016/j.jbi.2020.103452

    [17] Z. Idhafi, S. Agustian, F. Yanto, and N. Safaat H, ‘Peringkas teks otomatis pada artikel berbahasa indonesia menggunakan metode maximum marginal relevance’, CoSciTech, vol. 4, no. 3, pp. 609–618, Dec. 2023, doi: 10.37859/coscitech.v4i3.6311