Pakistan Research Repository Home

Title of Thesis

Ghausia Masood Gilani
Institute/University/Department Details
Institute of Statistics/ University of the Punjab
Number of Pages
Keywords (Extracted from title, table of contents and abstract of thesis)
epidemiology, breast cancer, cancer, cohort studies, case-control studies

Cancer incidence and mortality rates are increasing rapidly. Specifically breast cancer is the most common malignancy among women in all the countries wherever it has been studied. The trends in occurrence of breast cancer might be related to various social, cultural, environmental, life-style related habits and many other factors. The variations in the density and diversity of these factors among areas demand separate study for each geographical area. There has been research work on the medical aspect of breast cancer in Pakistan but it is deficient in epidemiological studies on breast cancer. This epidemiological research work on breast cancer has been presented with the application of advanced statistical methodology.

There are basically two choices for an epidemiological study, namely, prospective and retrospective or more technically cohort and case-control studies. For both types of studies, the statistical methodology used for analysis is logistic regression. Case-control studies are of two kinds; unmatched and matched case-control studies. For matched studies the controls are matched to the cases for some confounding variable. Age is a confounding variable for studies on cancer. Unconditional logistic regression analysis is applied to unmatched case-control studies and conditional logistic regression for analyzing matched case-control studies. The latter technique is quite costly therefore limited in application.

A matched case-control study on breast cancer has been carried out to determine the risk factors of breast cancer in Punjab, Pakistan. The data for breast cancer patients were collected from the two leading cancer hospitals Shaukat Khanum Memorial Cancer Hospital (SKMCH) and Institute of Nuclear Medicines and Oncology Lahore (lNMOL). Population-based controls were matched for age at diagnosis of the patients within two years in the ratio 1:2. The interview schedule designed for the study included questions regarding socio-economic status, monthly income, history of smoking, family marriage, family history of cancer and breast cancer, menstrual and reproductive history and anthropometric variables. The data set comprised of 1166 breast cancer patients and 2506 controls in all. For this study, the controls were selected as follows. Three villages were selected to represent the rural population (Shah De KIlUi, Manga Mandi and Ghandran) and two cities were selected to represent the urban population (Lahore being metropolitan city and Gujranwala being an industrial city). These areas were randomly selected and individual houses were selected according to convenience. One control from one house was interviewed for the study. Continuous variables of the study were examined for the assumption of linearity. For the variables with non-linear trend appropriate transformations for linearity were applied to include them in linear logistic models. All the continuous variables were finally modeled as categorical variables. Univariate analysis was used to identify variables to be included in multivariate analysis using p<0.25. All the variables of statistical or biological importance were included in the models for conditional multiple logistic regression analysis.

As a first step main effects model was developed. Later on various multivariate models were developed by including potential confounders. Some of the models (odds ratios and the significance level) developed for the study were presented in different Tables. The decision regarding inclusion or exclusion of the confounders or interaction tem1s was made as guided by the significance of the change in likelihood. Of the statistically significant interaction terms only those of biological importance were finally retained in the model.

Multiple logistic regression models were developed for this data set also by using unconditional approach. Various multivariate models were attempted by including different potential confounders. Some of these models were presented in the tables. These models were then compared to arrive at a final statistical model. Comparison between the two approaches of analysis was also shown. The risk of breast cancer increases after menopause. The risk factors of breast cancer before menopause may be different from those after menopause. Therefore a separate analysis was carried out for postmenopausal women. It was observed that women with late age at menopause were at significantly higher risk of breast cancer.

During the analysis some of the results (odds ratios or their confidence intervals) were observed to be inconsistent. Therefore an attempt was made to base analysis of the study on cases with complete infom1ation on all variables. It was termed as complete case analysis. All the cases with missing information for any covariate were deleted along with the corresponding matched controls. Logistic regression was applied to the revised dataset by using; both conditional and unconditional approach. The two types of models developed for the earlier revised data set were compared. The same analysis was carried out for postmenopausal women of the study also.

The models considered above were purely based on statistical results. An appropriate model would be defined by combining both statistical and biological standards. Therefore a final model was developed on the basis of such results and the results were presented in Table 24. For this study, a significant increase in risk of breast cancer was associated with history of smoking, family marriage, family history of breast cancer, late age at first full-term pregnancy (above 25 years) and higher body mass index (greater than or equal to 28). High parity (more than three children 3) was a significant protective factor but no protective effect of late menarche and breastfeeding was observed for this study. Late age at menopause was a strong detem1inant of postmenopausal breast cancer risk. Socio economic status, higher or lower, is not an indicator of breast cancer for this study. It was not an independent risk factor in the model for all women. Instead women from the lower class with late age at first full term pregnancy were at three times higher risk. For premenopausal women socio economic status was a moderate risk factor. The risk of breast cancer was higher for postmenopausal women of the lower class. However these postmenopausal women were protected by higher number of full term pregnancies.

The software packages used for the analysis of data were SPSS, EPIINFO 2000 and S-Plus. The two types of logistic regression analysis viz. unconditional and conditional analysis were applied to this case-control study. Similar results from the two methods lead to the conclusion that when the size of the study is large, logistic regression can be applied by choosing any method for analysis, unconditional or conditional logistic regression analysis. Another statistical approach known as complementary log-log regression can also be applied to case-control studies. Crude odds ratios and crude prevalence ratios can be estimated by logistic regression and complementary log-log models respectively.

Modeling case-control studies, the choice lies between the two links (logistic or log-log). Both links produce similar results when applied to model data from a case-control study. Complementary log-log transformation is similar to the logistic function. These functions are almost indistinguishable for ›< 0.2 and for rare events odds ratios and prevalence ratios approximately coincide. Moreover for case-control studies, the prevalence ratios is estimated by using the relationship PR =[1-(1- ›)exp(β)]/ ›o where ›o is the predicted prevalence in the reference group ( unexposed to the factor) and (3 is the associated regression coefficient from complementary log-log model and exp(β) is the rate ratio.

Download Full Thesis
3285.33 KB
S. No. Chapter Title of the Chapters Page Size (KB)
1 0 Contents
312.58 KB
2 1 Introduction 1
188.67 KB
  1.1 Cancer 3
  1.2 Breast Cancer 5
  1.3 Risk Factors For Breast Cancer 6
  1.4 Trends In Breast Cancer Occurrence 8
  1.5 Breast Cancer In Pakistan 9
  1.6 Objectives Of The Study 14
3 2 Literature Review 15
589.14 KB
  2.1 Epidemiology Of Breast Cancer 18
  2.2 Models On Breast Cancer Occurrence 53
4 3 Data Collection And Methodology 57
267.82 KB
  3.1 Data Collection 57
  3.2 Methodology 64
  3.3 Models For Binary Data 65
  3.4 Logistic Regression Models 66
  3.5 Complementary Log-Log Regression Models 73
5 4 Logistic Regression Analysis 77
1136.16 KB
  4.1 Conditional Logistic Regression Analysis Of Breast Cancer Data 85
  4.2 Unconditional Logistic Regression Analysis Of Breast Cancer Data 104
  4.3 Comparison Between Conditional And Unconditional Logistic Regression Models 115
  4.4 Complete Case Analysis 118
  4.5 Comparison Between Conditional Logistic Regression Models For €˜All Data€™ And €˜Revised Data€™ 128
  4.6 Combining Statistical And Biological Results (Final Model ) 133
  4.7 Results Of Final Multivariate Logistic Regression Model 137
6 5 Complementary Log-Log Regression Models 142
156.33 KB
  5.1 Comparison Between Logistic And Complementary Log-Log Models As Applied To Case-Control Study 145
7 6 Conclusion 152
600.6 KB
  6.1 General Finding 156
  6.2 References 161
  6.3 Appendices