Identification of Bengawan Solo River Water Quality Patterns Using K-Means Clustering Based on Physicochemical and Environmental Parameters

Identifikasi Pola Kualitas Air Sungai Bengawan Solo Menggunakan Klasterisasi K-Means Berdasarkan Parameter Fisik-Kimia dan Lingkungan

Authors

  • Widya Cholid Wahyudin Universitas Muhammadiyah Kudus
  • Tole Sutikno Universitas Ahmad Dahlan
  • Rusydi Umar Universitas Ahmad Dahlan
  • Widya Cholid Wahyudin Universitas Ahmad Dahlan

DOI:

https://doi.org/10.21070/joincs.v9i1.1710

Keywords:

Bengawan Solo, Clustering, K-Means, TDS, Water Quality

Abstract

Abstract. River water quality needs to be monitored continuously because changes in physicochemical and environmental parameters may indicate early changes in aquatic conditions. This study aims to identify water quality patterns in the Bengawan Solo River using K-Means clustering based on physicochemical and environmental parameters. The dataset consists of 1,753 field observations with attributes including temperature, pH, electrical conductivity, total dissolved solids, water color, odor, and weather condition. The research stages include feature selection, data preprocessing, categorical encoding, Z-score standardization, K-Means clustering, and cluster number evaluation. The number of clusters was tested from K=2 to K=5. Cluster quality was evaluated using Silhouette Score, Davies-Bouldin Index, Calinski-Harabasz Score, and Inertia. After data cleaning, 1,751 observations were used in the clustering process. The evaluation results show that K=2 is the best cluster number, with a Silhouette Score of 0.187638 and a Calinski-Harabasz Score of 456.873808. The clustering results formed two main patterns, namely Cluster 0 with 840 observations or 47.97% and Cluster 1 with 911 observations or 52.03%. Based on average parameter characteristics, Cluster 0 has higher electrical conductivity and TDS values than Cluster 1; therefore, it is interpreted as a higher water quality risk pattern. These results indicate that K-Means can identify initial water quality patterns in an unlabeled Bengawan Solo River dataset.

References

[1] F. Ghobadi and D. Kang, “Application of Machine Learning in Water Resources Management: A Systematic Literature Review,” Water (Basel)., vol. 15, no. 4, p. 620, 2023, doi: 10.3390/w15040620.

[2] A. Lokman, W. Z. W. Ismail, and N. A. A. Aziz, “A Review of Water Quality Forecasting and Classification Using Machine Learning Models and Statistical Analysis,” Water (Basel)., vol. 17, no. 15, p. 2243, 2025, doi: 10.3390/w17152243.

[3] X. Yan, T. Zhang, W. Du, Q. Meng, X. Xu, and X. Zhao, “A Comprehensive Review of Machine Learning for Water Quality Prediction over the Past Five Years,” J. Mar. Sci. Eng., vol. 12, no. 1, p. 159, 2024, doi: 10.3390/jmse12010159.

[4] P. Yuan et al., “Optimizing water quality index using machine learning: A six-year comparative study in riverine and reservoir systems,” Sci. Rep., vol. 15, p. 33919, 2025, doi: 10.1038/s41598-025-10187-8.

[5] A. del Castillo, C. Yebra-Montes, M. Verduzco Garibay, J. de Anda, A. Garcia-Gonzalez, and M. S. Gradilla-Hernández, “Simple prediction of an ecosystem-specific water quality index and the water quality classification of a highly polluted river through supervised machine learning,” Water (Basel)., vol. 14, no. 8, p. 1235, 2022, doi: 10.3390/w14081235.

[6] D. N. Khoi, N. T. Quan, D. Q. Linh, P. T. T. Nhi, and N. T. D. Thuy, “Using machine learning models for predicting the water quality index in the La Buong River, Vietnam,” Water (Basel)., vol. 14, no. 10, p. 1552, 2022, doi: 10.3390/w14101552.

[7] A. Masood, M. Niazkar, M. Zakwan, and R. Piraei, “A machine learning-based framework for water quality index estimation in the Southern Bug River,” Water (Basel)., vol. 15, no. 20, p. 3543, 2023, doi: 10.3390/w15203543.

[8] I. I. S. Shamsuddin, Z. Othman, and N. S. Sani, “Water quality index classification based on machine learning: A case from the Langat River Basin model,” Water (Basel)., vol. 14, no. 19, p. 2939, 2022, doi: 10.3390/w14192939.

[9] M. A. Novianta, S. Syafrudin, and B. Warsito, “K-Means clustering for grouping rivers in DIY based on water quality parameters,” JUITA: Jurnal Informatika, vol. 11, no. 1, pp. 155–163, 2023, doi: 10.30595/juita.v11i1.16986.

[10] A. M. Abadi and others, “Determining river water quality in the Special Region of Yogyakarta using K-Means and Fuzzy C-Means algorithms,” TEM Journal, 2025.

[11] Z. Zheng and others, “Spatiotemporal Prediction of Water Quality,” Environmental Pollution, 2025.

[12] D. B. Laucelli, L. Enríquez, J. Saldarriaga, and O. Giustolisi, “Using symbolic machine learning to assess and model substance transport and decay in water distribution networks,” Sci. Rep., vol. 14, p. 3194, 2024, doi: 10.1038/s41598-024-53746-1.

[13] S. Kapoor and A. Narayanan, “Leakage and the reproducibility crisis in machine-learning-based science,” Patterns, vol. 4, no. 9, p. 100804, 2023, doi: 10.1016/j.patter.2023.100804.

[14] W. C. Wahyudin, T. Sutikno, R. Umar, and A. Ridwan, “Comparison of Data Mining Model Performance in Heart Disease Detection with Feature Selection Application,” JOINCS (Journal of Informatics, Network, and Computer Science), vol. 8, no. 1, pp. 87–93, 2025.

[15] T. Hernanda, S. S. P. Nugroho, T. I. Izzati, F. K. Nisa, W. C. Wahyudin, and E. Nuriyatman, “IOT-BASED LEGAL POLICY IN CO₂ EMISSION SAFETY CONTROL TO SUPPORT GREEN TRANSPORTATION,” Jurnal Ilmiah Ilmu Terapan Universitas Jambi, vol. 9, no. 4, pp. 1434–1443, Oct. 2025, doi: 10.22437/jiituj.v9i4.48755.

[16] M. Muhammad, T. Sutikno, and I. Riadi, “A Comparative Study of K-Means and KNN Imputation for Handling Missing Data in Scholarship Applicant Datasets,” JUITA: Jurnal Informatika, vol. 13, no. 3, 2025, doi: 10.30595/juita.v13i3.26502.

[17] J. Wala, H. Herman, and R. Umar, “Implementasi K-Means Clustering pada Pengelompokan Pasien Penyakit Jantung,” JISKA (Jurnal Informatika Sunan Kalijaga), vol. 9, no. 3, pp. 205–216, 2024, doi: 10.14421/jiska.2024.9.3.205-216.

[18] R. Umar, I. Riadi, and M. Miladiah, “Sistem Identifikasi Keaslian Uang Kertas Rupiah Menggunakan Metode K-Means Clustering,” Techno.Com, vol. 17, no. 2, pp. 179–185, 2018, doi: 10.33633/tc.v17i2.1681.

[19] W. Cholid Wahyudin and S. P. Afrisia, “Design And Construction Of Shuff Photo Studio E-Booking Application Based On Responsive Web Rancang Bangun Aplikasi E-Booking Shuff Photo Studio Berbasis Web Responsif,” 2024.

[20] W. C. Wahyudin, F. M. Hana, and A. Prihandono, “Prediksi Stunting Pada Balita Di Rumah Sakit Kota Semarang Menggunakan Naive Bayes,” Jurnal Ilmu Komputer dan Matemtika, vol. 2019, pp. 32–36, 2023.

[21] A. Ridwan, T. Sutikno, I. Riyadi, and W. C. Wahyudin, “On-Time Student Graduation Prediction Modeling: A Comparative Analysis of Naive Bayes Algorithm and Other Data Mining Classifications,” JOINCS (Journal of Informatics, Network, and Computer Science), vol. 8, no. 2, 2025.

[22] J. Xu et al., “An Alternative to Laboratory Testing: Random Forest-Based Water Quality Prediction Framework for Inland and Nearshore Water Bodies,” Water (Basel)., vol. 13, no. 22, p. 3262, 2021, doi: 10.3390/w13223262.

[23] B. Schäfer et al., “Machine learning approach towards explaining water quality dynamics in an urbanised river,” Sci. Rep., vol. 12, p. 12346, 2022, doi: 10.1038/s41598-022-16342-9.

[24] Y. B. Tran, L. F. Arias-Rodriguez, and J. Huang, “Predicting high-frequency nutrient dynamics in the Danube River with surrogate models using sensors and Random Forest,” Frontiers in Water, vol. 4, p. 894548, 2022, doi: 10.3389/frwa.2022.894548.

[25] E. Dritsas and others, “Efficient Data-Driven Machine Learning Models for Water Quality Prediction,” Computers, vol. 12, no. 2, p. 16, 2023, doi: 10.3390/computers12020016.

[26] M. M. Hassan et al., “Efficient Prediction of Water Quality Index (WQI) Using Machine Learning Algorithms,” Human-Centric Intelligent Systems, vol. 1, pp. 86–97, 2021.

[27] F. Firdiani, S. Mandala, Adiwijaya, and A. H. Abdullah, “WaQuPs: A ROS-Integrated Ensemble Learning Model for Precise Water Quality Prediction,” Applied Sciences, vol. 14, no. 1, p. 262, 2024, doi: 10.3390/app14010262.

[28] L. Gao et al., “Development and evaluation of a multi-class model for water quality assessment using machine learning,” Sci. Rep., vol. 15, p. 4785, 2025, doi: 10.1038/s41598-025-88799-5.

[29] V. Sangwan and R. Bhardwaj, “Machine Learning Framework for Predicting Water Quality Classification,” Water Pract. Technol., vol. 19, no. 11, pp. 4499–4521, 2024, doi: 10.2166/wpt.2024.259.

[30] A. Aldrees and others, “Evaluation of Water Quality Indexes with Novel Machine Learning and SHapley Additive ExPlanation (SHAP) Approaches,” Journal of Water Process Engineering, vol. 59, p. 104789, 2024, doi: 10.1016/j.jwpe.2024.104789.

[31] H. A. Madni, M. Umer, and others, “Water-Quality Prediction Based on H2O AutoML and Explainable AI Techniques,” Water (Basel)., vol. 15, no. 3, p. 475, 2023.

[32] M. K. Nallakaruppan, E. Gangadevi, M. L. Shri, B. Balusamy, S. Bhattacharya, and S. Selvarajan, “Reliable Water Quality Prediction and Parametric Analysis Using Explainable AI Models,” Sci. Rep., vol. 14, p. 7520, 2024, doi: 10.1038/s41598-024-56775-y.

Downloads

Published

2026-04-30

How to Cite

Wahyudin, W. C., Tole Sutikno, Rusydi Umar, & Widya Cholid Wahyudin. (2026). Identification of Bengawan Solo River Water Quality Patterns Using K-Means Clustering Based on Physicochemical and Environmental Parameters: Identifikasi Pola Kualitas Air Sungai Bengawan Solo Menggunakan Klasterisasi K-Means Berdasarkan Parameter Fisik-Kimia dan Lingkungan. JOINCS (Journal of Informatics, Network, and Computer Science), 9(1), 43–48. https://doi.org/10.21070/joincs.v9i1.1710