Abstract
In an age dominated by the Internet revolution, companies are making their businesses available to diverse groups of customers by leveraging the usage of e-commerce. To keep track of customer satisfaction and a competitive edge in the market, e-commerce businesses need to scrutinize their customer reviews. The manual approach to analyzing customer reviews is time and effort-consuming. Automated product review analysis exists but resource-poor languages like Roman Urdu lack such resources. To overcome this problem, this research presents a solution by incorporating Topic Modeling for Roman Urdu product reviews. A dataset of 8K Roman Urdu product reviews was curated from an online shopping platform. Various language-specific data cleaning steps were applied to data in the pre-processing step before experimentation. Different algorithms for Topic Modeling were implemented, out of them BERTopic produced outstanding results leaving the others behind. The results were also evaluated with an open-source dataset to check model generalization and reliability. Utilizing the power of machine learning and recent approaches, this study is a step forward to automated review analysis in the Roman Urdu language.
References
Ahuja, R., Chug, A., Kohli, S., Gupta, S., & Ahuja, P. (2019). The impact of features extraction on the sentiment analysis. Procedia Computer Science, 152, 341-348.
Ali, I., & Naeem, M. A. (2022). Identifying and Profiling User Interest over time using Social Data. 2022 24th International Multitopic Conference (INMIC),
Chandio, B., Shaikh, A., Bakhtyar, M., Alrizq, M., Baber, J., Sulaiman, A.,…Noor, W. (2022). Sentiment analysis of roman Urdu on e-commerce reviews using machine learning. CMES-Comput. Model. Eng. Sci, 131, 1263-1287.
Chauhan, U., & Shah, A. (2021). Topic modeling using latent Dirichlet allocation: A survey. ACM Computing Surveys (CSUR), 54(7), 1-35.
Chehal, D., Gupta, P., & Gulati, P. (2021). Implementation and comparison of topic modeling techniques based on user reviews in e-commerce recommendations. Journal of Ambient Intelligence and Humanized Computing, 12, 5055-5070.
Chu, K. E., Keikhosrokiani, P., & Asl, M. P. (2022). A topic modeling and sentiment analysis model for detection and visualization of themes in literary texts. Pertanika Journal of Science & Technology, 30(4), 2535-2561.
Dahal, B., Kumar, S. A., & Li, Z. (2019). Topic modeling and sentiment analysis of global climate change tweets. Social network analysis and mining, 9, 1-20.
Daraz. Retrieved March 20, 2024 from https://www.daraz.pk
Daraz Code Mixed Product Reviews. Retrieved March 2024 from https://shrturl.app/GjO7wI
Daraz dataset. Retrieved March 2023 from https://shrturl.app/BLVwja
Daraz Roman Urdu Reviews. https://shrturl.app/XgXBus
Elahi, H. Roman Urdu Stopwords. Retrieved June 2023 from https://github.com/haseebelahi/roman-urdu-stopwords
Farzadnia, S., & Vanani, I. R. (2022). Identification of opinion trends using sentiment analysis of airlines passengers' reviews. Journal of Air Transport Management, 103, 102232.
Fathi Hafshejani, S., & Moaberfard, Z. (2023). Initialization for non-negative matrix factorization: a comprehensive review. International Journal of Data Science and Analytics, 16(1), 119-134.
Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794.
Hamamoto, R., Takasawa, K., Machino, H., Kobayashi, K., Takahashi, S., Bolatkan, A.,…Yamada, M. (2022). Application of non-negative matrix factorization in oncology: one approach for establishing precision medicine. Briefings in Bioinformatics, 23(4), bbac246.
Hasib, K. M., Towhid, N. A., & Alam, M. G. R. (2021). Topic modeling and sentiment analysis using online reviews for bangladesh airlines. 2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON),
Khan, A. R., Karim, A., Sajjad, H., Kamiran, F., & Xu, J. (2022). A clustering framework for lexical normalization of Roman Urdu. Natural Language Engineering, 28(1), 93-123.
Korfiatis, N., Stamolampros, P., Kourouthanassis, P., & Sagiadinos, V. (2019). Measuring service quality from unstructured data: A topic modeling application on airline passengers’ online reviews. Expert Systems with Applications, 116, 472-486.
Krishnan, A. (2023). Exploring the Power of Topic Modeling Techniques in Analyzing Customer Reviews: A Comparative Analysis. arXiv preprint arXiv:2308.11520.
Kwon, H.-J., Ban, H.-J., Jun, J.-K., & Kim, H.-S. (2021). Topic modeling and sentiment analysis of online review for airlines. Information, 12(2), 78.
Mohammed, S. H., & Al-augby, S. (2020). Lsa & lda topic modeling classification: Comparison study on e-books. Indonesian Journal of Electrical Engineering and Computer Science, 19(1), 353-362.
Negara, E. S., Triadi, D., & Andryani, R. (2019). Topic modelling twitter data with latent dirichlet allocation method. 2019 International Conference on Electrical Engineering and Computer Science (ICECOS),
Octoparse. Retrieved Mar 2023 from https://www.octoparse.com
Ogunleye, B., Maswera, T., Hirsch, L., Gaudoin, J., & Brunsdon, T. (2023). Comparison of topic modelling approaches in the banking context. Applied Sciences, 13(2), 797.
Online shopping in Pakistan. Retrieved March 20, 2024 from https://www.daraz.pk
Pathan, A. F., & Prakash, C. (2021). Unsupervised aspect extraction algorithm for opinion mining using topic modeling. Global Transitions Proceedings, 2(2), 492-499.
Qader, W. A., Ameen, M. M., & Ahmed, B. I. (2019). An overview of bag of words; importance, implementation, applications, and challenges. 2019 international engineering conference (IEC),
Samsir, S., Saragih, R. S., Subagio, S., Aditiya, R., & Watrianthos, R. (2023). BERTopic Modeling of Natural Language Processing Abstracts: Thematic Structure and Trajectory. JURNAL MEDIA INFORMATIKA BUDIDARMA, 7(3), 1514-1520.
Sharifian-Attar, V., De, S., Jabbari, S., Li, J., Moss, H., & Johnson, J. (2022). Analysing longitudinal social science questionnaires: topic modelling with BERT-based embeddings. 2022 IEEE international conference on big data (big data),
Shen, C.-w., & Ho, J.-t. (2020). Technology-enhanced learning in higher education: A bibliometric analysis with latent semantic approach. Computers in Human Behavior, 104, 106177.
Sun, J., & Yan, L. (2023). Using topic modeling to understand comments in student evaluations of teaching. Discover Education, 2(1), 25.
Tahir, R., & Naeem, M. A. (2022). A Machine Learning based Approach to Identify User Interests from Social Data. 2022 24th International Multitopic Conference (INMIC),
Thavareesan, S., & Mahesan, S. (2019). Sentiment analysis in Tamil texts: A study on machine learning techniques and feature representation. 2019 14th Conference on industrial and information systems (ICIIS),
Tusar, M. T. H. K., & Islam, M. T. (2021). A comparative study of sentiment analysis using NLP and different machine learning techniques on US airline Twitter data. 2021 International Conference on Electronics, Communications and Information Technology (ICECIT),
Wagire, A. A., Rathore, A., & Jain, R. (2020). Analysis and synthesis of Industry 4.0 research landscape: Using latent semantic analysis approach. Journal of Manufacturing Technology Management, 31(1), 31-51.
Wang, W., Feng, Y., & Dai, W. (2018). Topic analysis of online reviews for two competitive products using latent Dirichlet allocation. Electronic Commerce Research and Applications, 29, 142-156.
Yin, H., Song, X., Yang, S., & Li, J. (2022). Sentiment analysis and topic modeling for COVID-19 vaccine discussions. World Wide Web, 25(3), 1067-1083.
Zankadi, H., Idrissi, A., Daoudi, N., & Hilal, I. (2023). Identifying learners’ topical interests from social media content to enrich their course preferences in MOOCs using topic modeling and NLP techniques. Education and Information Technologies, 28(5), 5567-5584.