Optimization of Data Search Process Using a Combination of Merge Sort and Binary Search

Main Article Content

Dimas Rizky Aulia
Faris Andisa
Hafid Nasrullah
Firdiawan Hermawan

Abstract

The growth in data volume in the digital age makes speed in processing and searching for information an increasingly important need. This research focuses on optimizing the data search process by comparing two algorithm combination strategies. The first strategy combines recursive Merge Sort with manual Binary Search, while the second strategy uses iterative (bottom-up) Merge Sort paired with the lower bound function from the C++ library. This research employs a quantitative experimental approach by testing execution times on random datasets of 1,000, 10,000, and 100,000 elements. The test results show that bottom-up Merge Sort has faster execution times compared to the recursive version, for example, 71.2 ms versus 107.2 ms for a dataset of 100,000 elements. This speed is achieved because the iterative approach does not require calling recursive functions, which can add to the memory load. For the search process, manual Binary Search proved to be very fast for a one-time search. However, in a repeated search scenario, the lower bound function is more efficient and consistent in execution time. The combination of bottom-up Merge Sort and lower bound becomes the optimal choice for large-scale systems that require a one-time sorting process and repeated searches, such as search engines and massive data filters. This research concludes that selecting an algorithm strategy appropriate for the frequency and needs of the search can improve system efficiency in processing large-scale data.

Article Details

Section
Articles

References

[1] P. Zhang et al., “NetSHa: In-Network Acceleration of LSH-Based Distributed Search,” IEEE Transactions on Parallel and Dis-tributed Systems, vol. 33, no. 9, pp. 2213–2229, Sep. 2022, doi: 10.1109/TPDS.2021.3135842.

[2] R. Perego, G. E. Pibiri, and R. Venturini, “Compressed Indexes for Fast Search of Semantic Data,” IEEE Trans Knowl Data Eng, vol. 33, no. 9, pp. 3187–3198, Sep. 2021, doi: 10.1109/TKDE.2020.2966609.

[3] P. Bose, K. Douïeb, J. Iacono, and S. Langerman, “The Power and Limitations of Static Binary Search Trees with Lazy Finger,” Algorithmica, vol. 76, no. 4, pp. 1264–1275, Dec. 2016, doi: 10.1007/S00453-016-0224-X/METRICS.

[4] T. Brown, A. Prokopec, and D. Alistarh, “Non-blocking interpolation search trees with doubly-logarithmic running time,” Pro-ceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, pp. 276–291, Feb. 2020, doi: 10.1145/3332466.3374542.

[5] W. Qiao, L. Guo, Z. Fang, M. C. F. Chang, and J. Cong, “TopSort: A High-Performance Two-Phase Sorting Accelerator Opti-mized on HBM-Based FPGAs,” IEEE Trans Emerg Top Comput, vol. 11, no. 2, pp. 404–419, Apr. 2023, doi: 10.1109/TETC.2022.3228575.

[6] L. Wu, H. Huang, K. Su, S. Cai, and X. Zhang, “An I/O efficient model checking algorithm for large-scale systems,” IEEE Trans Very Large Scale Integr VLSI Syst, vol. 23, no. 5, pp. 905–915, May 2015, doi: 10.1109/TVLSI.2014.2330061.

[7] P. Michiardi, D. Carra, and S. Migliorini, “Cache-Based Multi-Query Optimization for Data-Intensive Scalable Computing Frameworks,” Information Systems Frontiers, vol. 23, no. 1, pp. 35–51, Feb. 2021, doi: 10.1007/S10796-020-09995-2/METRICS.

[8] M. AbdelNaby, M. E. Khalefa, Y. Taha, and A. Hassan, “Towards efficient top-k fuzzy auto-completion queries,” Alexandria Engineering Journal, vol. 61, no. 7, pp. 5783–5791, Jul. 2022, doi: 10.1016/J.AEJ.2020.06.012.

[9] R. T. Rimi, K. M. A. Hasan, and T. Tsuji, “Multidimensional query processing algorithm by dimension transformation,” Sci Rep, vol. 13, no. 1, pp. 1–11, Dec. 2023, doi: 10.1038/S41598-023-31758-7;SUBJMETA.

[10] M. Yu, C. Chai, and G. Yu, “A Tree-Based Indexing Approach for Diverse Textual Similarity Search,” IEEE Access, vol. 9, pp. 8866–8876, 2021, doi: 10.1109/ACCESS.2020.3022057.

[11] N. Sultana, S. Paira, S. Chandra, and S. K. S. Alam, “A brief study and analysis of different searching algorithms,” Proceedings of the 2017 2nd IEEE International Conference on Electrical, Computer and Communication Technologies, ICECCT 2017, no. March, 2017, doi: 10.1109/ICECCT.2017.8117821.

[12] J. Clarkson, K. Y. Lin, and K. D. Glazebrook, “A Classical Search Game in Discrete Locations,” https://doi.org/10.1287/moor.2022.1279, vol. 48, no. 2, pp. 687–707, Jul. 2022, doi: 10.1287/MOOR.2022.1279.

[13] A. Lin, “Binary search algorithm,” WikiJournal of Science, vol. 2, no. 1, pp. 1–13, 2019, doi: 10.15347/wjs/2019.005.

[14] M. N. Saeed et al., “Empirical Analysis of Quaternary and Binary Search,” pp. 1–6, 2024.

[15] P. Bose, J. Cardinal, J. Iacono, G. Koumoutsos, and S. Langerman, “Competitive online search trees on trees,” Proc West Mark Ed Assoc Conf, vol. 2020-January, pp. 1878–1891, 2020, doi: 10.1137/1.9781611975994.115.

[16] A. Mehmood, “ASH Search: Binary Search Optimization,” International Journal of Computer Applications, vol. 178, no. 15, pp. 10–17, 2019, doi: 10.5120/ijca2019918788.

[17] Q. Liu, Y. Ren, Z. Zhu, D. Li, X. Ma, and Q. Li, “RankAxis: Towards a Systematic Combination of Projection and Ranking in Multi-Attribute Data Exploration,” IEEE Trans Vis Comput Graph, vol. 29, no. 1, pp. 701–711, Jan. 2023, doi: 10.1109/TVCG.2022.3209463.

[18] Z. Chen and A. Zhang, “A Survey of Approximate Quantile Computation on Large-Scale Data,” IEEE Access, vol. 8, pp. 34585–34597, 2020, doi: 10.1109/ACCESS.2020.2974919.

[19] M. Markina and M. Buzdalov, “Hybridizing non-dominated sorting algorithms: Divide-and-conquer meets best order sort,” GEC-CO 2017 - Proceedings of the Genetic and Evolutionary Computation Conference Companion, vol. 2017-January, pp. 153–154, Jul. 2017, doi: 10.1145/3067695.3076074.