Model based on Machine Learning for the classification of banking transactions carried out through PSE
DOI:
https://doi.org/10.56294/saludcyt2024.1358Keywords:
machine learning, natural language, bank transactions, natural language processingAbstract
The financial sector, and specifically banking entities, have experienced changes in recent years thanks to technology, such as the digitization of transactions and the creation of applications such as digital wallets and PFM (Personal Finance Manager), generating gigabytes of information. Managing knowledge becomes essential to face new competitors, provide better services, understand the financial behavior of clients and face great challenges when processing and analyzing the volume of information available, which in most cases requires a complex preprocessing process and data quality. This is the case of banking transactions, which include free text information in their observation fields, making analysis and classification difficult, preventing the bank and its clients from analyzing financial behavior over a period of time. To solve this problem, the use of Machine Learning techniques was proposed to automate the transaction classification process based on text written in natural language, and provide the information that allows an analysis of the financial behavior and personal expenses of each user. Once the training, evaluation and comparison of different models was completed, using the CRISP-DM methodology as a development framework, an optimized solution was reached that solves the classification problem using the KNN algorithm, with an accuracy close to 96%. The results showed a high level of confidence when classifying a transaction, based on a description, into a category.
References
nequality, and financial stability in Asia. The Journal of Asian Finance, Economics and Business. 2020; 10(7): 73-85.
Davenport TH, Mittal N. All-in on AI: How smart companies win big with artificial intelligence. 1 a ed. Estados Unidos: Harvard Business Review Press. 2023.
Agarwal S, Mukherjee P, Chakraborty B, Nandi D. A Novel Automated Financial Transaction System Using Natural Language Processing. En Hassanien A, Azar A, Gaber T, Bhatnagar RF, Tolba M. The International Conference on Advanced Machine Learning Technologies and Applications. Advances in Intelligent Systems and Computing. Springer International Publishing. 2020. 535-545.
Kim Y, Enke D. Instance Selection Using Genetic Algorithms for an Intelligent Ensemble Trading System. Procedia Computer Science. 2017; 114: 465–472.
Potharaju SP, Sreedevi M. A Novel Subset Feature Selection Framework for Increasing the Classification Performance of sonar Targets. Procedia Computer Science. 2018; 125: 902–909.
Takahashi M, Azuma H, Tsuda K. A Study on Validity Detection for Shipping Decision in the Mail-order Industry. Procedia Computer Science. 2017; 112: 1318–1325.
Díaz Y, Hidalgo MÁ, Lagunes V, Pichardo O, Martínez B. A Hybrid Methodology Based on CRISP-DM and TDSP for the Execution of Preprocessing Tasks in Mexican Environmental Laws. En Pichardo O, Martínez J, Martínez B. Advances in Computational Intelligence. Lecture Notes in Computer Science. Springer International Publishing. 2022. 68-82.
Dogan A, Birant D. Machine learning and data mining in manufacturing. Expert Systems with Applications. 2021; 166: 114060.
Ma S, Liu ZP. Machine learning for atomic simulation and activity prediction in heterogeneous catalysis: current status and future. ACS Catalysis. 2020; 10(22): 13213-13226.
Cazacu M, Titan E. Adapting CRISP-DM for social sciences. Broad Research in Artificial Intelligence and Neuroscience. 2021; 11(2): 99-106.
Salinas E, Barrientos, AF, Quiroz JF. Pasarelas de pago en Colombia, un mercado cambiante y altamente competitivo. Colombia. Institución Universitaria de Envigado. 2021.
Obando J, Pulido J, Gómez J. (2020). Procesamiento del lenguaje natural para reconocer mensajes de textos extorsivos a través del análisis sintáctico y lematización. 2020; 16(1): 33-42.
Akuma S, Lubem T, Adom IT. Comparing Bag of Words and TF-IDF with different models for hate speech detection from live tweets. International Journal of Information Technology. 2022; 14(7): 3629-3635.
Bansal M, Goyal A, Choudhary A. A comparative analysis of K-nearest neighbor, genetic, support vector machine, decision tree, and long short term memory algorithms in machine learning. Decision Analytics Journal. 2022; 3: 100071.
Shah K, Patel H, Sanghvi D, Shah M. A comparative analysis of logistic regression, random forest and KNN models for the text classification. Augmented Human Research. 2020; 5(1): 12.
Published
Issue
Section
License
Copyright (c) 2024 Fabio Alberto Vargas Agudelo, Dario Enrique Soto Duran, Mauricio Urrego Álvarez, Edison Javier Yepes Sanchez, Iván Andrés Delgado González (Author)
This work is licensed under a Creative Commons Attribution 4.0 International License.
The article is distributed under the Creative Commons Attribution 4.0 License. Unless otherwise stated, associated published material is distributed under the same licence.