Implementation of The Sequential Biclustering Method and Centroid-Based Imputation (Fuzzy C-Means) for Missing Values Imputation
Keywords:
Cheng and Church, Mean Squared Residue, Missing Completely At Random, Missing Rate, Root Mean Squared ErrorAbstract
Missing values are a common issue in gene expression data and may significantly affect the accuracy of subsequent analyses if not properly handled. This study aims to implement and evaluate a missing value imputation method based on Sequential Biclustering and Centroid-Based Imputation with Fuzzy C-Means (FCM) on gene expression data of Type 2 Diabetes Mellitus patients. Missing values were generated under the Missing Completely At Random (MCAR) mechanism with missing rate ranging from 5% to 55%, and five replications were conducted for each missing rate. Biclustering was performed using the Cheng and Church algorithm to identify biclusters based on the Mean Squared Residue (MSR), followed by missing value estimation using FCM with 2 to 5 biclusters. The performance of the imputation method was evaluated using Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) by comparing the imputed values with the original data. The results indicate that the configuration with 2 biclusters consistently produces the lowest and most stable MSE, RMSE, and MAE across all missing rates.
References
International Diabetes Federation. IDF Diabetes Atlas. 11th ed. Brussels. Belgium: International Diabetes Federation. 2025. doi: https://diabetesatlas.org.
T. Aittokallio. “Dealing with missing values in large-scale studies: microarray data imputation and beyond.” Briefings in Bioinformatics. vol. 11. no. 2. pp. 253–264. 2010. doi: https://doi.org/10.1093/bib/bbp059.
R. J. A. Little and D. B. Rubin. Statistical Analysis with Missing Data. 3rd ed. Hoboken. NJ. USA: Wiley. 2019. doi: https://doi.org/10.1002/9781119482260.
O. Troyanskaya. M. Cantor. G. Sherlock. P. Brown. T. Hastie. R. Tibshirani. D. Botstein. and R. B. Altman. “Missing value estimation methods for DNA microarrays.” Bioinformatics. vol. 17. no. 6. pp. 520–525. 2001. doi: https://doi.org/10.1093/bioinformatics/17.6.520.
H. Junninen. H. Niska. K. Tuppurainen. J. Ruuskanen. and M. Kolehmainen. “Methods for imputation of missing values in air quality data sets.” Atmospheric Environment. vol. 38. no. 18. pp. 2895–2907. 2004. doi: https://doi.org/10.1016/j.atmosenv.2004.02.026.
A. D. Putri. “Sequential Biclustering Based on Mean Square Residue and Euclidean Distance for Missing Value Imputation in Gene Expression Data.” Undergraduate Thesis. Department of Mathematics. Universitas Indonesia. Depok. Indonesia. 2021.
N. S. Sianipar. “Fuzzy Biclustering Means with Particle Swarm Optimization for Missing Value Imputation in Gene Expression Data.” Undergraduate Thesis. Department of Mathematics. Universitas Indonesia. Depok. Indonesia. 2025.
S. C. Madeira and A. L. Oliveira. “Biclustering algorithms for biological data analysis: A survey.” IEEE/ACM Transactions on Computational Biology and Bioinformatics. vol. 1. no. 1. pp. 24–45. 2004. doi: https://doi.org/10.1109/TCBB.2004.2.
B. Pontes. R. Giráldez. and J. S. Aguilar-Ruiz. “Biclustering on expression data: A review.” Journal of Biomedical Informatics. vol. 57. pp. 163–180. 2015. doi: https://doi.org/10.1016/j.jbi.2015.06.028.
R. Henriques and S. C. Madeira. “Biclustering and Triclustering: A review of computational approaches and applications.” Journal of Biomedical Informatics. vol. 88. pp. 28–46. 2018.
J. C. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. New York. NY. USA: Plenum Press. 1981. doi: https://doi.org/10.1007/978-1-4757-0450-1.
D. Dembélé and P. Kastner. “Fuzzy C-means method for clustering microarray data.” Bioinformatics. vol. 19. no. 8. pp. 973–980. 2003. doi: https://doi.org/10.1093/bioinformatics/btg119.
F. Meng. H. Gong. and Y. Wang. “A Fuzzy C-Means Based Missing Value Imputation Method for Gene Expression Data.” Applied Soft Computing. vol. 18. pp. 155–165. 2014.
Z. Cai. X. Wang. and Y. Yin. “Gene expression data imputation using fuzzy clustering techniques.” Computational Biology and Chemistry. vol. 65. pp. 75–82. 2016.
K. Eren. M. Deveci. O. Küçüktunç. and Ü. V. Çatalyürek. “A comparative analysis of biclustering algorithms for gene expression data.” Briefings in Bioinformatics. vol. 14. no. 3. pp. 279–292. 2013. doi: https://doi.org/10.1093/bib/bbs032.
D. B. Rubin. “Inference and Missing Data.” Biometrika. vol. 63. no. 3. pp. 581–592. 1976. doi: https://doi.org/10.1093/biomet/63.3.581.
T. Adeyomo. S. A. Abdulhamid. and M. S. Ibrahim. “Performance Evaluation of Min-Max and Z-Score Data Normalization Techniques.” International Journal of Computer Network and Information Security. vol. 10. no. 10. pp. 1–7. 2018.
G. Gan. C. Ma. and J. Wu. Data Clustering: Theory. Algorithms. and Applications. 2nd ed. Philadelphia. PA. USA: SIAM. 2020. doi: https://doi.org/10.1137/1.9781611971507
K. Y. Yeung. D. R. Haynor. and W. L. Ruzzo. “Validating clustering for gene expression data.” Bioinformatics. vol. 17. no. 4. pp. 309–318. 2001. doi: https://doi.org/10.1093/bioinformatics/17.4.309.
Y. Cheng and G. M. Church. “Biclustering of expression data.” in Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology. San Diego. CA. USA. 2000. pp. 93–103.
T. Hastie. R. Tibshirani. and J. Friedman. The Elements of Statistical Learning: Data Mining. Inference. and Prediction. 2nd ed. New York. NY. USA: Springer. 2009. doi: https://doi.org/10.1007/978-0-387-84858-7.
M. A. Khan. “A Comparative Study on Imputation Techniques: Introducing A Transformer Model for Robust and Efficient Handling of Missing EEG Amplitude Data.” Bioengineering. vol. 11. no. 8. pp. 740. 2024. doi: https://doi.org/10.3390/bioengineering11080740.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Yolanda Azzahra, Titin Siswantining, Setia Pramana

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
