Theory
and application of decision tree analysis
with statistical software
Ch.Cong, C.P.Tsokos
Recently, the decision tree analysis plays a very significant role in
the analysis and modeling of various types of medical data, especially in
cancer research. In addition, decision tree analysis has been extensively used
in financial world, for example, loan approval, portfolio management, health
& risk assessment, insurance claim evaluation, supply chain management,
etc. It is also widely applied in fields such as engineering, forensic
examination and biotechnology. The objective of the present study is to review
the theory behind decision tree analysis and to illustrate its usefulness by
applying the subject area to various applications [1-26]. Furthermore,
statistical software information is given to assist scientists in applying
decision tree analysis.
Finnaly we give some conclusions. In the
present study, we try to give some review on the theory of machine learning
methods of decision tree analysis and emphasized its usefulness in analyzing
complicated data. There are several advantages of decision tree over other data
mining or machine learning methods including: it performs well with large data
in short time, and it is a white-box model easy to interpret and other
statistical or mathematical techniques can be incorporated into it. Some real
world examples of decision tree analysis in breast cancer, supply chain
management and power system are given to illustrate the extensive applications
of it. Finally, we provided references of how these analyses could be done via
statistical software so individuals interested in performing decision tree
analysis can use it.
1.
L.Breiman, J.Friedman, R.Olshen, C.Stone. Classification and Regression Trees. New
York; Chapman and Hall, 1984.
2.
G.Kass.
An exploratory technique for investigating large quantities of categorical data.
Applied Statistics,
29, 1980, 119-127.
3.
J.Magidson.
The use of the new ordinal algorithm in CHAID to target profitable segments. The Journal of
Database Marketing, 1, 1993,
29-48.
4.
J.R.Quinlan.
Induction of Decision Trees. Machine Learning 1, 1986, 81-106.
5.
C.Shannon, W.Weaver. Model of
communication, 1949.
6.
J.R.Quinlan. C4.5: Programs for Machine Learning. Morgan
Kaufmann Publishers,
7.
J.R.Quinlan. C5.0: An Informal Tutorial, http://www.rulequest.com/see5-unix.html
8.
W.Y.Loh, Y.S.Shih. Split selection methods for classification trees.
Statistica Sinica, Vol. 7, 1997, p. 815 - 840.
9. L.Breiman.
Random Forests. Machine learning, 45 (1), 2001, 5-32.12.
M.R.Segal. Regression
trees for censored data, Biometrics 44, 1988pp.35-47.
16.
R.Davis, J.Anderson. Exponential survival trees. Statistics in Medicine,
8, 1989, 947-962.
© 1995-2008 Kazan State University