I=
Pakistan Research Repository Home
 

Title of Thesis

Architecture Recovery of Legacy Software Systems Using Unsupervised Machine Learning Techniques

Author(s)

Onaiza Maqbool

Institute/University/Department Details
Department of Computer Science / Lahore University of Management Sciences, Lahore
Session
2003
Subject
Computer Science
Number of Pages
260
Keywords (Extracted from title, table of contents and abstract of thesis)
Techniques, Hierarchical, Algorithms, Software, Modularization, Strengths, Metarule, Recovery, Architecture, Machine, Learning, Characteristics

Abstract
Perhaps the most important aspect in maintaining software legacy systems is un-derstanding their architecture. Architectural documentation is often unavailable. Thus eforts need to be made to recover the architectural design from the source code. This thesis addresses the problem of recovering the architecture of software systems for greater understanding, and modularizing them for greater maintainability, using machine learning techniques.
We use clustering to obtain a high-level view of a software's architecture, by identify-ing major sub-systems within it. For this purpose, we analyze the behaviour of existing similarity and distance measures when applied to software artifacts and keeping in view software characteristics, yielding explanations to some previously unanswered questions.We develop two new hierarchical clustering algorithms that address the problem of ar-bitrary decisions taken by existing hierarchical algorithms. We also propose a similarity measure suitable for software clustering. The performance of the proposed algorithms and similarity measure is evaluated using internal and external assessment. Instead of using only one expert decomposition for external assessment, as is commonly done, we use decompositions prepared by 4-5 experts for each test system. Such an approach allows us to validate the idea of multiple views of a software system. Experiments carried out on ¯ve open source legacy software systems show that the performance of our proposed algorithm is better than previously used algorithms.
Interpreting the results of clustering algorithms is often difficult. To make clusters easier to understand, we propose a labeling scheme for clusters and compare two alter-native ranking schemes that can be utilized for this purpose. We demonstrate how the labels assigned by our scheme aid understanding of the clustering process of clustering algorithms. We also provide a comparison between cluster analysis and concept analysis as modularization techniques, and give examples of their application to different software structures, thus indicating the strengths and limitations of the two techniques.
Finally, we use association rule mining to gain insight into the low-level structure of software systems by examining relationships between architectural quarks i.e. functions,global variables and user de¯ned types. Metarule-guided association rule mining is used to
identify problems within structured legacy systems. Re-engineering patterns that present solutions to these problems are proposed. Results for the test systems reveal interesting characteristics which allow us to understand legacy systems and their evolution.

Download Full Thesis
2,215 KB
S. No. Chapter Title of the Chapters Page Size (KB)
1 0 CONTENTS
 

 

viii
98 KB
2 1 INTRODUCTION

1.1 Legacy Systems
1.2 Re-engineering and Reverse Engineering
1.3 Architecture Recovery
1.4 Automated Techniques based on Unsupervised Machine Learning
1.5 Research Contributions
1.6 Organization of this Dissertation

1
91 KB
3 2 BACKGROUND

2.1 Techniques for Extracting Procedural Design
2.2 Techniques for Extracting Architectural Design
2.3 Related Work
2.4 Summary

14
142 KB
4 3 CLUSTERING TECHNIQUES: APPLICATION IN THE SOFTWARE DOMAIN

3.1 An Overview of Clustering
3.2 Steps in Clustering
3.3 Summary and Conclusions

38
175 KB
5 4 THE COMBINED AND WEIGHTED COMBINED CLUSTERING ALGORITHMS

4.1 The Combined Algorithm
4.2 The Weighted Combined Algorithm
4.3 Experimental Setup
4.4 Experimental Results
4.5 Summary and Conclusions

56
770 KB
6 5 A COMPARISON OF CLUSTERING ALGORITHMS BASED ON EXTERNAL ASSESSMENT

5.1 Relative Assessment of Expert Decompositions
5.2 External Assessment
5.3 Summary and Conclusions

104
413 KB
7 6 CLUSTER LABELING

6.1 Frequency Based Approach to Cluster Labeling
6.2 Experimental Results
6.3 Summary and Conclusions

126
300 KB
8 7 CONCEPT ANALYSIS AND CLUSTERING FOR LEGACY CODE MODULARIZATION

7.1 Concept Analysis
7.2 The Modularization Process
7.3 Concept Analysis and Clustering for Modularization: A Comparison
7.4 Summary and Conclusions

155
468 KB
9 8 METARULE-GUIDED ASSOCIATION RULE MINING FOR PROGRAM UNDERSTANDING

8.1 An Overview of Association Rule Mining
8.2 Re-engineering Patterns for Structured Legacy Systems
8.3 Metarule-Guided Association Rule Mining for Program Understanding
8.4 Experiments and Results
8.5 Discussion of Results
8.6 Related Work
8.7 Summary and Conclusions Appendix Re-engineering Patterns

180
311 KB
10 9 CONCLUSIONS

9.1 Summary of Research Contributions
9.2 Future Work
9.3 Concluding Remarks

212
62 KB
11 10 BIBLIOGRAPHY AND APPENDIX

 

220
113 KB