Science

Theory and application of decision tree analysis

with statistical software

Ch.Cong, C.P.Tsokos

University of South Florida

Tampa, FL, 33620, USA

Recently, the decision tree analysis plays a very significant role in the analysis and modeling of various types of medical data, especially in cancer research. In addition, decision tree analysis has been extensively used in financial world, for example, loan approval, portfolio management, health & risk assessment, insurance claim evaluation, supply chain management, etc. It is also widely applied in fields such as engineering, forensic examination and biotechnology. The objective of the present study is to review the theory behind decision tree analysis and to illustrate its usefulness by applying the subject area to various applications [1-26]. Furthermore, statistical software information is given to assist scientists in applying decision tree analysis.

 

Finnaly we give some conclusions. In the present study, we try to give some review on the theory of machine learning methods of decision tree analysis and emphasized its usefulness in analyzing complicated data. There are several advantages of decision tree over other data mining or machine learning methods including: it performs well with large data in short time, and it is a white-box model easy to interpret and other statistical or mathematical techniques can be incorporated into it. Some real world examples of decision tree analysis in breast cancer, supply chain management and power system are given to illustrate the extensive applications of it. Finally, we provided references of how these analyses could be done via statistical software so individuals interested in performing decision tree analysis can use it.

 

1.          L.Breiman, J.Friedman, R.Olshen, C.Stone. Classification and Regression Trees. New York; Chapman and Hall, 1984.

2.          G.Kass. An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29, 1980, 119-127.

3.          J.Magidson. The use of the new ordinal algorithm in CHAID to target profitable segments. The Journal of Database Marketing, 1, 1993, 29-48.

4.          J.R.Quinlan. Induction of Decision Trees. Machine Learning 1, 1986, 81-106.

5.          C.Shannon, W.Weaver. Model of communication, 1949.

6.          J.R.Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, California, ISBN 1-55860-238-0, 1993.

7.          J.R.Quinlan. C5.0: An Informal Tutorial, http://www.rulequest.com/see5-unix.html

8.          W.Y.Loh, Y.S.Shih. Split selection methods for classification trees. Statistica Sinica, Vol. 7, 1997, p. 815 - 840.

9.          L.Breiman.

 Random Forests. Machine learning, 45 (1), 2001, 5-32.

10.       L.Gordon, R.A.Olshen. Tree-structured survival analysis. Cancer Treatment Reports 69, 1985, 1065-1069.

11.       E.L.Kaplan, P.Meier, Nonparametric estimation from incomplete observations, J. Am. Stat. Assoc. 53, 1958, 457-481.

12.       M.R.Segal. Regression trees for censored data, Biometrics 44, 1988pp.35-47.

13.       R.E.Tarone, J.Ware. On distribution-free tests for equality of survival distributions. Biometrika, 64, 1977,156-160.

14.       D.P.Harrington, T.R.Fleming. A class of rank test procedures for censored survival data. Biometrika, 69, 1982, 553-566.

15.       P.Bacchetti, M.R.Segal. Survival trees with time-dependent covariates: application to estimating changes in the incubation period of AIDS. Lifetime Data Analysis, Vol.1, number 1, 1995.

16.       R.Davis, J.Anderson. Exponential survival trees. Statistics in Medicine, 8, 1989, 947-962.

17.       M.Lebalanc, L.Crowlry. Relative risk trees for censored survival data. Biometrics,v.48, 1992, 411-425.

18.       M.Lebalanc, L.Crowlry. Survival trees by goodness of split. Journal of the American Statistical Association, v.88, 1993, 457-467.

19.       X.G.Su, J.J.Fan. Multivariate survival trees: a maximum likelihood approach based on frailty models. Biometrics, 60, 2004, 93-99.

20.       N.A.Ibrahim, et al. Decision tree for competing risks survival probability in breast cancer study. International Journal of Biomedical Sciences, Volume 3, Number 1, 2008.

21.       C.P.Tsokos. Modelling of environmental engineering and health problems. Int.J. Problems of nonlinear analysis in engineering systems, No.1(35), v.17, 2011, 1-5  (in English and in Russian).

22.       S.H.Lim, The design of controls in supply chain management sustainable collaboration using decision tree algorithm. International Journal of Computer Science and Network Security, Vol. 6, No. 5A, 2006.

23.       L.Wehenkel, T.Van Cutsem, M.Ribbens-Pavella. An artificial intelligence framework for online transient stability assessment of power systems. IEEE Trans. Power Syst., vol. 4, No. 2, 1989, 789-800.

24.       L.Wehenkel, T.Van Cutsem, M.Ribbens-Pavella. Artificial intelligence applied to on-line transient stability assessment of electric power systems. Proc. of the 10th IFAC World Congress, 1987, 308-313.

25.       www.r-project.org.

26.       www.spss.com.

 




[Contents]

homeKazanUniversitywhat's newsearchlevel upfeedback

© 1995-2008 Kazan State University