Application of Machine Learning Models in Water Quality Classification in Lake Maninjau: Random Forest as the Optimal Solution

Abdurrahman Niarman(1), Reni Kurnia(2), Iswandi Iswandi(3),
(1)   Indonesia
(2) Universitas Negeri Padang  Indonesia
(3) Universitas Islam Negeri Mahmud Yunus Batusangkar  Indonesia

Corresponding Author


DOI : https://doi.org/10.24036/et.v12i1.128941

Abstract


This research develops a machine learning model to classify water quality in Lake Maninjau using data from the Ministry of Environment and Forestry's Onlimo application. The dataset includes parameters such as temperature, pH, DO, conductivity, TDS, salinity, turbidity, nitrate and ammonium. Four machine learning algorithms were tested: Logistic Regression, SVM, Gradient Boosting, and Random Forest. As a result, Random Forest shows the best performance with an average accuracy of 87.33% and a standard deviation of 6.97%, and a test accuracy of 90.63%. This model is effective in monitoring and managing water quality, supporting authorities in water resource management decision making. This research also shows how the integration of machine learning and IoT can provide practical solutions in environmental monitoring.


References


Aish, A. M., Zaqoot, H. A., Sethar, W. A. & Aish, D. A. (2023). Prediction of groundwater quality index in the Gaza coastal aquifer using supervised machine learning techniques. Water Practice & Technology, 18(3), 501–521. https://doi.org/10.2166/wpt.2023.028

Alomani, S. M., Alhawiti, N. I. & Alhakamy, A. (2022). Prediction of Quality of Water According to a Random Forest Classifier. International Journal of Advanced Computer Science and Applications, 13(6). https://doi.org/10.14569/IJACSA.2022.01306105

Ankrah, B., Brew, L. & Acquah, J. (2024). Multi-Class Classification of Genetic Mutation Using Machine Learning Models. Computational Journal of Mathematical and Statistical Sciences, 3(2), 280–315. https://doi.org/10.21608/cjmss.2024.267064.1040

Baek, S.-S., Pyo, J. & Chun, J. A. (2020). Prediction of Water Level and Water Quality Using a CNN-LSTM Combined Deep Learning Approach. Water, 12(12), 3399. https://doi.org/10.3390/w12123399

Damayanti, A. A., Wahjono, H. D. & Santoso, A. D. (2022). Pemantauan Kualitas Air Secara Online dan Analisis Status Mutu Air di Danau Toba, Sumatera Utara. Jurnal Sumberdaya Alam Dan Lingkungan, 9(3), 113–120. https://doi.org/10.21776/ub.jsal.2022.009.03.4

Dogo, E. M., Nwulu, N. I., Twala, B. & Aigbavboa, C. O. (2020). Empirical Comparison of Approaches for Mitigating Effects of Class Imbalances in Water Quality Anomaly Detection. IEEE Access, 8, 218015–218036. https://doi.org/10.1109/ACCESS.2020.3038658

Haekal, M. & Wibowo, W. C. (2023). Prediksi Kualitas Air Sungai Menggunakan Metode Pembelajaran Mesin: Studi Kasus Sungai Ciliwung. Jurnal Teknologi Lingkungan, 24(2), 273–282. https://doi.org/10.55981/jtl.2023.795

Hassan, Md. M., Hassan, Md. M., Akter, L., Rahman, Md. M., Zaman, S., Hasib, K. Md., Jahan, N., Smrity, R. N., Farhana, J., Raihan, M. & Mollick, S. (2021). Efficient Prediction of Water Quality Index (WQI) Using Machine Learning Algorithms. Human-Centric Intelligent Systems, 1(3–4), 86. https://doi.org/10.2991/hcis.k.211203.001

Hayder, G., Kurniawan, I. & Mustafa, H. M. (2020). Implementation of Machine Learning Methods for Monitoring and Predicting Water Quality Parameters. Biointerface Research in Applied Chemistry, 11(2), 9285–9295. https://doi.org/10.33263/BRIAC112.92859295

Islam Khan, Md. S., Islam, N., Uddin, J., Islam, S. & Nasir, M. K. (2022a). Water quality prediction and classification based on principal component regression and gradient boosting classifier approach. Journal of King Saud University - Computer and Information Sciences, 34(8), 4773–4781. https://doi.org/10.1016/j.jksuci.2021.06.003

Keputusan Menteri Negara Lingkungan Hidup Nomor 115. (2003). Keputusan Menteri Negara Lingkungan Hidup Nomor 115 Tahun 2003. https://dokumen.tips/documents/kepmen-no-115-tahun-2003.html?page=1

Mohammed, A. & Kora, R. (2023). A comprehensive review on ensemble deep learning: Opportunities and challenges. Journal of King Saud University - Computer and Information Sciences, 35(2), 757–774. https://doi.org/10.1016/j.jksuci.2023.01.014

Ningsih, L., Jaman, J. H., Salam, N. I. & Haikal, M. (2024). Perbandingan Kinerja Algoritma Klasifikasi Status Mutu Air. Indonesian Journal of Multidisciplinary on Social and Technology, 2(1), 72–76. https://doi.org/10.31004/ijmst.v2i1.298

Patel, J., Amipara, C., Ahanger, T. A., Ladhva, K., Gupta, R. K., Alsaab, H. O., Althobaiti, Y. S. & Ratna, R. (2022). A Machine Learning-Based Water Potability Prediction Model by Using Synthetic Minority Oversampling Technique and Explainable AI. Computational Intelligence and Neuroscience, 2022, 1–15. https://doi.org/10.1155/2022/9283293

S, S., Tamatgar, N., Dilli, R. & M, K. (2024). Deployment of Random Forest Algorithm for prediction of ammonia in river water. Proceedings of the 2024 13th International Conference on Software and Computer Applications, 18–23. https://doi.org/10.1145/3651781.3651811

Sami, O., Elsheikh, Y. & Almasalha, F. (2021). The Role of Data Pre-processing Techniques in Improving Machine Learning Accuracy for Predicting Coronary Heart Disease. International Journal of Advanced Computer Science and Applications, 12(6). https://doi.org/10.14569/IJACSA.2021.0120695

Saraswat, P. & Raj, S. (2022). DATA PRE-PROCESSING TECHNIQUES IN DATA MINING: A REVIEW. International Journal of Innovative Research in Computer Science & Technology, 122–125. https://doi.org/10.55524/ijircst.2022.10.1.22

Sudarso, J., Tri Suryono, T. S., P. Yoga, G., Imroatusshoolikhah, I., Ibrahim, A., Laela Sari, L. S., Muhammad Badjoeri, M. B. & Octavianto Samir, O. S. (2021). Effect of Anthropogenic Activity on Benthic Macroinvertebrate Functional Feeding Groups in Small Streams of West Sumatra, Indonesia. Sains Malaysiana, 51(11), 3551–3566. https://doi.org/10.17576/jsm-2022-5111-04

Suh, Y. S., Shin, S. K., Baang, D., Seo, S. M. & Lee, J. B. (2021). A Brief Review of Non-linear Support Vector Machine for Machine Learning Programming. https://www.kns.org/files/pre_paper/46/21A-011-%EC%84%9C%EC%9A%A9%EC%84%9D.pdf

Victoriano, J. M., Lacatan, L. L. & Vinluan, A. A. (2020). Predicting River Pollution Using Random Forest Decision Tree with GIS Model: A Case Study of MMORS, Philippines. International Journal of Environmental Science and Development, 11(1), 36–42. https://doi.org/10.18178/ijesd.2020.11.1.1222

Wolfram, J., Stehle, S., Bub, S., Petschick, L. L. & Schulz, R. (2021). Water quality and ecological risks in European surface waters – Monitoring improves while water quality decreases. Environment International, 152, 106479. https://doi.org/10.1016/j.envint.2021.106479

Wright, V. (2019). Machine Learning: Using the Logistic Regression Model to Predict Coronary Heart Disease. https://www.wrightanalytics-mn.com/pages/Machine_Learning_Using_the_Logistic_Regression_Model_to_Predict_Coronary_Heart_Final.pdf

Zhang, Z., Zhao, Y., Canes, A., Steinberg, D. & Lyashevska, O. (2019). Predictive analytics with gradient boosting in clinical medicine. Annals of Translational Medicine, 7(7), 152–152. https://doi.org/10.21037/atm.2019.03.29


Article Metrics

 Abstract Views : 6 times

Refbacks

  • There are currently no refbacks.