Method of
flexible co-variates
in the
proportional hazards models
D.Vovoras, C.P.Tsokos
The problem
of survival data analysis has been a very active research field for many
decades now. Many of them are centered around the important Cox regression
model-its extensions and alternative forms. The present study is employing the
means of generalized additive regression modeling to extend the scope of the
classical Cox-PH model in order to provide flexible models for studying the
effect of the prognostic factors on the hazard function. Smoothing splines are
used to predict the behavior and form non-linear additive proportional hazards models,
in order to test the effectiveness of alternative treatments in a breast cancer
clinical trial.
For introduction we note.
Each year governments and organizations around
the world fund thousands of clinical trials which follow the history of disease
and evaluate alternative treatments. Accurate analysis of the provided
information is critical, not only because the nature of the care for
individuals is directly affected by the findings, but also in terms of time and
money. While the determination of a treatment's efficacy is an important goal for
a clinical trial, the identification of prognostic factors is an equally
important component of the analysis. This article describes such methods to
identify and characterize the effect of potential prognostic factors on disease
endpoints, as well as to define differing risk groups. We note that valid
comparisons between treatments are possible only after correctly accounting for
factors that may affect the course of the disease.
Survival analysis or failure time data analysis
is interested in the time
The intention of this paper is to expose the
reader to some of the important issues dealt with in these kinds of studies. šIf the focus is non-parametric, the cumulative
hazard function or the survival function can be estimated using the
Nelson-Aalen and the Kaplan-Meier estimators, respectively. The proportional
hazards model is a popular semi-parametric tool for analyzing censored failure
time data- we are using the term failure time as a generic term to refer to the
time up to the endpoint of interest. It is semi-parametric in the sense that it
does not make any distributional assumptions about the failure times, but on
the other hand does specify the form in which covariates, or prognostic
factors, affect the hazard rate of failure. The model easily accommodates right
censored data, usually the case for clinical trials where the patients enter at
random times but whose follow-up ends at a fixed time point.
The Cox or proportional hazards regression model
is on many occasions used to simultaneously model prognostic factors and
treatment effects for the failure times involved. We briefly outline the model
in the next section and give a way to incorporate tools that can investigate
whether a prognostic factor is important, and whether it has linear or
nonlinear relationship to the failure time. The identified model automatically
provides estimates of treatment effects related nonlinearly to the prognostic
factors. Although the focus here is on censored survival data, the same
techniques have been used to other problems, for example regression modeling
for time dependent distributional parameters. While some prognostic factors are
linearly related to survival (usually categorical), the influence of others
(clinical laboratory values or clinical characteristics) may well be more
accurately described by a non-linear relationship.
The method used, namely an additive proportional hazards model,
relaxes the linearity assumptions and allows smooth non-linear functions of the
covariates to be included in the log hazard ratio. The advantage is that the
transformations involved are not chosen a priori by the analyst, but rather are
estimated flexibly from the data at hand. Another attractive feature is that
the need to categorize a continuous covariate in order to discover the nature
of its effect is alleviated. Though there are several methods to accommodate
the non-linear terms, the methods used here are splines, piecewise polynomials satisfying
continuity constraints at the knots joining the pieces. There are general information
with a general review of splines (De Boor, Eubank, Wahba). Hastie et al, as
well as Hastie and Tibshirani, and Gray discuss the effects of number of knots
with respect to sensitivity of the proposed models, as well as other alternatives,
namely regression splines.
As mentioned we will aim mainly at studying
nonlinear effects of the covariates involved, as well as important interactions
between prognostic factors- the interactive contribution may turn out to be
more significant than the individual, again with emphasis on examining the
shape of the effect. To look for possible interactions between continuous and
categorical covariates, separate spline functions are fit for the continuous
covariate within the levels of the categorical covariate. A test for the
hypothesis that the shape of the function is the same within the levels was
proposed by Gray, the test though cannot be regarded as reliable.
The subject of examining the proportional
hazards model assumptions has generated an extensive literature. Cox, in his
original paper suggested fitting time varying covariates, and Zucker and Karr
proposed using non-parametric penalized likelihood estimation to flexible
functions of time for the covariates.
For discussion we note.
In this work we described different approaches
for modelling. The modeling procedures described are useful for a variety of
reasons. Firstly, they are very helpful in identifying a proper model for the
data at hand, since they give a valuable insight into the behavior of the
hazard function for various prognostic factors, thus preventing model misspecification
which can lead to incorrect conclusions about treatment efficacy. Secondly,
they provide information about the relationship of the covariates and disease
risk which goes beyond the standard techniques. For example, we found that it
is because of the U shaped function
that the best fitting line for the hemoglobin concentration relationship would
have zero slope, which is why it was not identified as a significant predictor
in the linear analysis. We should stress that linearity remains a special case
and thus linear relationships can be easily confirmed with flexible modeling,
as it was the case for the effect of tumor size. Lastly, these methods can be
used beyond modeling of prognostic factors for patient care or planning for
future trials, into an investigative tool in a variety of settings.
Finally, a different approach to flexible
survival analysis, the regression tree approach, Segal, focuses on identifying
subgroups of individuals with respective survival. The two approaches are somewhat
complementary in that the additive model seeks for smooth main effects, while
the regression tree structured technique is designed to detect interactions
between variables. The authors have investigated decision tree issues for
relapse times in the same trial, a comparison between these two methods would
be an interesting topic for further study.
Acknowledgement. The authors wish to thank N.A. Ibrahim for supplying the
data.
© 1995-2008 Kazan State University