Title: | Improper Bagging Survival Tree |
---|---|
Description: | Fit a full or subsampling bagging survival tree on a mixture of population (susceptible and nonsusceptible) using either a pseudo R2 criterion or an adjusted Logrank criterion. The predictor is evaluated using the Out Of Bag Integrated Brier Score (IBS) and several scores of importance are computed for variable selection. The thresholds values for variable selection are computed using a nonparametric permutation test. See 'Cyprien Mbogning' and 'Philippe Broet' (2016)<doi:10.1186/s12859-016-1090-x> for an overview about the methods implemented in this package. |
Authors: | Cyprien Mbogning and Philippe Broet |
Maintainer: | Cyprien Mbogning <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2 |
Built: | 2024-11-22 04:36:31 UTC |
Source: | https://github.com/cran/iBST |
Fit a bagging survival tree on a mixture of population (susceptible and nonsusceptible)using either a pseudo R2 criterion or an adjusted Logrank criterion. The predictor is evaluated using the Out Of Bag Integrated Brier Score (IBS) and several scores of importanceare computed for variable selection. The thresholds values for variable selection are computed using a nonparametric permutation test. See Cyprien Mbogning and Philippe Broet (2016)<doi:10.1186/s12859-016-1090-x> for an overview about the methods implemented in this package.
Package: | iBST |
Type: | Package |
Version: | 1.2 |
Date: | 2023-01-12 |
License: | GPL(>=2.0) |
Cyprien Mbogning and Philippe Broet
Maintainer: Cyprien Mbogning <[email protected]>
Mbogning, C. and Broet, P. (2016). Bagging survival tree procedure for variable selection and prediction in the presence of nonsusceptible patients. BMC bioinformatics, 17(1), 1.
Duhaze Julianne et al. (2020). A Machine Learning Approach for High-Dimensional Time-to-Event Prediction With Application to Immunogenicity of Biotherapies in the ABIRISK Cohort. Frontiers in Immunology, 11.
Bagg_Surv
Bagg_pred_Surv
improper_tree
## Not run: data(burn) myarg = list(cp = 0, maxcompete = 0, maxsurrogate = 0, maxdepth = 2) Y.names = c("T3" ,"D3") P.names = 'Z2' T.names = c("Z1", paste("Z", 3:11, sep = '')) mybag = 40 feat_samp = length(T.names) set.seed(5000) ## fit an improper survival tree burn.tree <- suppressWarnings(improper_tree(burn, Y.names, P.names, T.names, method = "R2", args.rpart = myarg)) plot(burn.tree) text(burn.tree, cex = .7, xpd = TRUE) ## fit an improper Bagging survival tree with the adjusted Logrank criterion burn.BagEssai0 <- suppressWarnings(Bagg_Surv(burn, Y.names, P.names, T.names, method = "LR", args.rpart = myarg, args.parallel = list(numWorkers = 1), Bag = mybag, feat = feat_samp)) ## fit an improper Bagging survival tree with the pseudo R2 criterion burn.BagEssai1 <- suppressWarnings(Bagg_Surv(burn, Y.names, P.names, T.names, method = "R2", args.rpart = myarg, args.parallel = list(numWorkers = 1), Bag = mybag, feat = feat_samp)) ## Plot the variable importance scores par(mfrow=c(1,3)) barplot(burn.BagEssai1$IIS, main = 'IIS', horiz = TRUE, las = 1, cex.names = .8, col = 'lightblue') barplot(burn.BagEssai1$DIIS, main = 'DIIS', horiz = TRUE, las = 1, cex.names = .8, col = 'grey') barplot(burn.BagEssai1$DEPTH, main = 'MinDepth', horiz = TRUE, las = 1, cex.names = .8, col = 'purple') ## evaluation of the Bagging predictors pred0 <- suppressWarnings(Bagg_pred_Surv(burn, Y.names, P.names, burn.BagEssai0, args.parallel = list(numWorkers = 1), OOB = TRUE)) pred1 <- suppressWarnings(Bagg_pred_Surv(burn, Y.names, P.names, burn.BagEssai1, args.parallel = list(numWorkers = 1), OOB = TRUE)) ## End(Not run)
## Not run: data(burn) myarg = list(cp = 0, maxcompete = 0, maxsurrogate = 0, maxdepth = 2) Y.names = c("T3" ,"D3") P.names = 'Z2' T.names = c("Z1", paste("Z", 3:11, sep = '')) mybag = 40 feat_samp = length(T.names) set.seed(5000) ## fit an improper survival tree burn.tree <- suppressWarnings(improper_tree(burn, Y.names, P.names, T.names, method = "R2", args.rpart = myarg)) plot(burn.tree) text(burn.tree, cex = .7, xpd = TRUE) ## fit an improper Bagging survival tree with the adjusted Logrank criterion burn.BagEssai0 <- suppressWarnings(Bagg_Surv(burn, Y.names, P.names, T.names, method = "LR", args.rpart = myarg, args.parallel = list(numWorkers = 1), Bag = mybag, feat = feat_samp)) ## fit an improper Bagging survival tree with the pseudo R2 criterion burn.BagEssai1 <- suppressWarnings(Bagg_Surv(burn, Y.names, P.names, T.names, method = "R2", args.rpart = myarg, args.parallel = list(numWorkers = 1), Bag = mybag, feat = feat_samp)) ## Plot the variable importance scores par(mfrow=c(1,3)) barplot(burn.BagEssai1$IIS, main = 'IIS', horiz = TRUE, las = 1, cex.names = .8, col = 'lightblue') barplot(burn.BagEssai1$DIIS, main = 'DIIS', horiz = TRUE, las = 1, cex.names = .8, col = 'grey') barplot(burn.BagEssai1$DEPTH, main = 'MinDepth', horiz = TRUE, las = 1, cex.names = .8, col = 'purple') ## evaluation of the Bagging predictors pred0 <- suppressWarnings(Bagg_pred_Surv(burn, Y.names, P.names, burn.BagEssai0, args.parallel = list(numWorkers = 1), OOB = TRUE)) pred1 <- suppressWarnings(Bagg_pred_Surv(burn, Y.names, P.names, burn.BagEssai1, args.parallel = list(numWorkers = 1), OOB = TRUE)) ## End(Not run)
Use the Bagging improper survival tree to predict on new features and to evaluate the predictor using Out Of Bag Integrated Brier Scores with either the Nelson Aalen estimator or the Breslow estimator. A permutation importance score is also computed using OOB observations.
Bagg_pred_Surv(xdata, Y.names, P.names, resBag, args.parallel = list(numWorkers = 1), new_data = data.frame(), OOB = FALSE)
Bagg_pred_Surv(xdata, Y.names, P.names, resBag, args.parallel = list(numWorkers = 1), new_data = data.frame(), OOB = FALSE)
xdata |
The learning data frame |
Y.names |
A vector of the names of the two variables of interest (the time-to-event is follow by the event indicator) |
P.names |
The names of independant variables acting on the non-susceptible population (the plateau) |
resBag |
The result of the |
args.parallel |
a list containing the number of parallel computing arguments: The number of workers, the type of parallelization to achieve, ... see |
new_data |
An optional data frame to validate the bagging procedure (the test dataset) |
OOB |
A value of |
PREDNA |
A matrix with Nelson Aalen predictions on all individuals of the learning sample |
PREDBRE |
A matrix with Breslow predictions on all individuals of the learning sample |
tabhazNAa |
A list of matrix with Nelson Aalen prediction of each tree of the bagging sequence with the leaf node prediction in each column |
tabhazBRe |
A list of matrix with Breslow prediction of each tree of the bagging sequence with the leaf node prediction in each column |
OOB |
A value of |
Timediff |
The execution time of the prediction procedure |
TEST |
A value of |
Cyprien Mbogning and Philippe Broet
Mbogning, C. and Broet, P. (2016). Bagging survival tree procedure for variable selection and prediction in the presence of nonsusceptible patients. BMC bioinformatics, 17(1), 1.
Duhaze Julianne et al. (2020). A Machine Learning Approach for High-Dimensional Time-to-Event Prediction With Application to Immunogenicity of Biotherapies in the ABIRISK Cohort. Frontiers in Immunology, (11).
## Not run: data(burn) myarg = list(cp = 0, maxcompete = 0, maxsurrogate = 0, maxdepth = 2) Y.names = c("T3" ,"D3") P.names = 'Z2' T.names = c("Z1", paste("Z", 3:11, sep = '')) mybag = 40 feat_samp = length(T.names) set.seed(5000) burn.BagEssai0 <- suppressWarnings(Bagg_Surv(burn, Y.names, P.names, T.names, method = "LR", args.rpart = myarg, args.parallel = list(numWorkers = 1), Bag = mybag, feat = feat_samp)) burn.BagEssai1 <- suppressWarnings(Bagg_Surv(burn, Y.names, P.names, T.names, method = "R2", args.rpart = myarg, args.parallel = list(numWorkers = 1), Bag = mybag, feat = feat_samp)) pred0 <- Bagg_pred_Surv(burn, Y.names, P.names, burn.BagEssai0, args.parallel = list(numWorkers = 1), OOB = TRUE) pred1 <- Bagg_pred_Surv(burn, Y.names, P.names, burn.BagEssai1, args.parallel = list(numWorkers = 1), OOB = TRUE) ## End(Not run)
## Not run: data(burn) myarg = list(cp = 0, maxcompete = 0, maxsurrogate = 0, maxdepth = 2) Y.names = c("T3" ,"D3") P.names = 'Z2' T.names = c("Z1", paste("Z", 3:11, sep = '')) mybag = 40 feat_samp = length(T.names) set.seed(5000) burn.BagEssai0 <- suppressWarnings(Bagg_Surv(burn, Y.names, P.names, T.names, method = "LR", args.rpart = myarg, args.parallel = list(numWorkers = 1), Bag = mybag, feat = feat_samp)) burn.BagEssai1 <- suppressWarnings(Bagg_Surv(burn, Y.names, P.names, T.names, method = "R2", args.rpart = myarg, args.parallel = list(numWorkers = 1), Bag = mybag, feat = feat_samp)) pred0 <- Bagg_pred_Surv(burn, Y.names, P.names, burn.BagEssai0, args.parallel = list(numWorkers = 1), OOB = TRUE) pred1 <- Bagg_pred_Surv(burn, Y.names, P.names, burn.BagEssai1, args.parallel = list(numWorkers = 1), OOB = TRUE) ## End(Not run)
Bagging sunbsampling procedure to aggregate several improper trees using either the pseudo-R2 procedure or the adjusted Logrank procedure. Several scores for variables importance are computed.
Bagg_Surv(xdata, Y.names, P.names, T.names, method = "R2", args.rpart, args.parallel = list(numWorkers = 1), Bag = 100, feat = 5)
Bagg_Surv(xdata, Y.names, P.names, T.names, method = "R2", args.rpart, args.parallel = list(numWorkers = 1), Bag = 100, feat = 5)
xdata |
The learning data frame |
Y.names |
A vector of the names of the two variables of interest (the time-to-event is follow by the event indicator) |
P.names |
The names of independant variables acting on the non-susceptible population (the plateau) |
T.names |
The names of independant variables acting on the survival of the susceptible population |
method |
The choosen method (either |
args.rpart |
The improper survival tree parameters: a list of options that control details of the rpart algorithm.
|
args.parallel |
a list containing the number of parallel computing arguments: The number of workers, the type of parallelization to achieve, ... see |
Bag |
The number of Bagging samples to consider |
feat |
The size of features subsample. A full baging when feat is the total number of features. |
For the Bagging procedure, it is mendatory to set maxcompete = 0
and maxsurrogate = 0
within the args.rpart
arguments. This will ensured the correct calculation of the importance of variables and also a better computation time.
A list of ten elements
MaxTreeList |
The list of improper survival trees computed during the bagging procedure |
IIS |
The Index Importance Score |
DIIS |
The Depth Index Importance Score |
DEPTH |
The minimum depth importance Score |
IND_OOB |
A list of length |
IIND_SAMP |
The final list of length |
IIND_SAMP |
The initial list of sample individuals used for each improper survival tree at teh beginning |
Bag |
The number of bagging samples retained at the end of the procedure after removing the trees without leaves |
indrpart |
a vector of |
Timediff |
The ellapsed time of the Bagging procedure |
This version of the code allows for the moment only one variable to have an impact on the cured population.The next version will allow more than one variable.
Cyprien Mbogning and Philippe Broet
Mbogning, C. and Broet, P. (2016). Bagging survival tree procedure for variable selection and prediction in the presence of nonsusceptible patients. BMC bioinformatics, 17(1), 1.
Duhaze Julianne et al. (2020). A Machine Learning Approach for High-Dimensional Time-to-Event Prediction With Application to Immunogenicity of Biotherapies in the ABIRISK Cohort. Frontiers in Immunology, 11.
## Not run: data(burn) myarg = list(cp = 0, maxcompete = 0, maxsurrogate = 0, maxdepth = 2) Y.names = c("T3" ,"D3") P.names = 'Z2' T.names = c("Z1", paste("Z", 3:11, sep = '')) mybag = 40 feat_samp = length(T.names) set.seed(5000) burn.BagEssai0 <- suppressWarnings(Bagg_Surv(burn, Y.names, P.names, T.names, method = "LR", args.rpart = myarg, args.parallel = list(numWorkers = 1), Bag = mybag, feat = feat_samp)) burn.BagEssai1 <- suppressWarnings(Bagg_Surv(burn, Y.names, P.names, T.names, method = "R2", args.rpart = myarg, args.parallel = list(numWorkers = 1), Bag = mybag, feat = feat_samp)) ## End(Not run)
## Not run: data(burn) myarg = list(cp = 0, maxcompete = 0, maxsurrogate = 0, maxdepth = 2) Y.names = c("T3" ,"D3") P.names = 'Z2' T.names = c("Z1", paste("Z", 3:11, sep = '')) mybag = 40 feat_samp = length(T.names) set.seed(5000) burn.BagEssai0 <- suppressWarnings(Bagg_Surv(burn, Y.names, P.names, T.names, method = "LR", args.rpart = myarg, args.parallel = list(numWorkers = 1), Bag = mybag, feat = feat_samp)) burn.BagEssai1 <- suppressWarnings(Bagg_Surv(burn, Y.names, P.names, T.names, method = "R2", args.rpart = myarg, args.parallel = list(numWorkers = 1), Bag = mybag, feat = feat_samp)) ## End(Not run)
The burn data frame has 154 rows and 17 columns.
data(burn)
data(burn)
A data frame with 154 observations on the following 17 variables.
Obs
Observation number
Z1
Treatment: 0-routine bathing 1-Body cleansing
Z2
Gender (0=male 1=female)
Z3
Race: 0=nonwhite 1=white
Z4
Percentage of total surface area burned
Z5
Burn site indicator: head 1=yes, 0=no
Z6
Burn site indicator: buttock 1=yes, 0=no
Z7
Burn site indicator: trunk 1=yes, 0=no
Z8
Burn site indicator: upper leg 1=yes, 0=no
Z9
Burn site indicator: lower leg 1=yes, 0=no
Z10
Burn site indicator: respiratory tract 1=yes, 0=no
Z11
Type of burn: 1=chemical, 2=scald, 3=electric, 4=flame
T1
Time to excision or on study time
D1
Excision indicator: 1=yes 0=no
T2
Time to prophylactic antibiotic treatment or on study time
D2
Prophylactic antibiotic treatment: 1=yes 0=no
T3
Time to straphylocous aureaus infection or on study time
D3
Straphylocous aureaus infection: 1=yes 0=no
Klein and Moeschberger (1997) Survival Analysis Techniques for Censored and truncated data, Springer
.
Ichida et al. Stat. Med.
12 (1993): 301-310.
data(burn) ## maybe str(burn) ;
data(burn) ## maybe str(burn) ;
Fit an improper survival tree for the mixed population (susceptible and nonsusceptible) using either the proposed pseudo R2 criterion or an adjusted Logrank criterion
improper_tree(xdata, Y.names, P.names, T.names, method = "R2", args.rpart)
improper_tree(xdata, Y.names, P.names, T.names, method = "R2", args.rpart)
xdata |
The learning data frame |
Y.names |
A vector of the names of the two variables of interest (the time-to-event is follow by the event indicator) |
P.names |
The names of independant variables acting on the non-susceptible population (the plateau) |
T.names |
The names of independant variables acting on the survival of the susceptible population |
method |
The choosen method (either |
args.rpart |
The improper survival tree parameters: a list of options that control details of the rpart algorithm.
|
An unprunned improper survival tree
Cyprien Mbogning and Philippe Broet
Mbogning, C. and Broet, P. (2016). Bagging survival tree procedure for variable selection and prediction in the presence of nonsusceptible patients. BMC bioinformatics, 17(1), 1.
## Not run: data(burn) myarg = list(cp = 0, maxcompete = 0, maxsurrogate = 0, maxdepth = 3) Y.names = c("T3" ,"D3") P.names = 'Z2' T.names = c("Z1", paste("Z", 3:11, sep = '')) burn.tree <- suppressWarnings(improper_tree(burn, Y.names, P.names, T.names, method = "R2", args.rpart = myarg)) plot(burn.tree) text(burn.tree, cex = .7, xpd = TRUE) ## End(Not run)
## Not run: data(burn) myarg = list(cp = 0, maxcompete = 0, maxsurrogate = 0, maxdepth = 3) Y.names = c("T3" ,"D3") P.names = 'Z2' T.names = c("Z1", paste("Z", 3:11, sep = '')) burn.tree <- suppressWarnings(improper_tree(burn, Y.names, P.names, T.names, method = "R2", args.rpart = myarg)) plot(burn.tree) text(burn.tree, cex = .7, xpd = TRUE) ## End(Not run)
Variable selection using the permutation test on several scores of importance: IIS
, DIIS
and DEPTH
.
permute_select_surv(xdata, Y.names, P.names, T.names, importance = "IIS", method = "R2", Bag, args.rpart, args.parallel = list(numWorkers = 1), nperm = 50)
permute_select_surv(xdata, Y.names, P.names, T.names, importance = "IIS", method = "R2", Bag, args.rpart, args.parallel = list(numWorkers = 1), nperm = 50)
xdata |
The learning data frame |
Y.names |
A vector of the names of the two variables of interest (the time-to-event is follow by the event indicator) |
P.names |
The names of independant variables acting on the non-susceptible population (the plateau) |
T.names |
The names of independant variables acting on the survival of the susceptible population |
importance |
The importance score to consider: either |
method |
The splitting method: either |
Bag |
The number of Bagging samples to consider |
args.rpart |
The improper survival tree parameters: a list of options that control details of the rpart algorithm.
|
args.parallel |
a list containing the number of parallel computing arguments: The number of workers, the type of parallelization to achieve, ... see |
nperm |
The number of permutation samples to consider for the permutation test |
Testing weither the importance score is null or not.
A list of five elements:
pvalperm1 |
The permutation test P-values ranking in decreasing order |
pvalperm2 |
The permutation test P-values ranking in decreasing order considering an approximate gaussian distribution under the null hypothesis |
pvalKS |
The Kolmogorov-Smirnov P-values of the comparisons between the observed importance under the null hypothesis and a theoretical gaussian distribution |
IMPH1 |
The observed importance score |
PERMH0 |
A matrix with the importance scores for each permutation sample in each column |
Cyprien Mbogning and Philippe Broet
Mbogning, C. and Broet, P. (2016). Bagging survival tree procedure for variable selection and prediction in the presence of nonsusceptible patients. BMC bioinformatics, 17(1), 1.
## Not run: myarg = list(cp = 0, maxcompete = 0, maxsurrogate = 0, maxdepth = 2) Y.names = c("T3" ,"D3") P.names = 'Z2' T.names = c("Z1", paste("Z", 3:11, sep = '')) mybag = 40 set.seed(5000) data(burn) resperm0 <- suppressWarnings(permute_select_surv(xdata = burn, Y.names, P.names, T.names, method = "LR", Bag = mybag, args.rpart = myarg, args.parallel = list(numWorkers = 1), nperm = 150)) ## End(Not run)
## Not run: myarg = list(cp = 0, maxcompete = 0, maxsurrogate = 0, maxdepth = 2) Y.names = c("T3" ,"D3") P.names = 'Z2' T.names = c("Z1", paste("Z", 3:11, sep = '')) mybag = 40 set.seed(5000) data(burn) resperm0 <- suppressWarnings(permute_select_surv(xdata = burn, Y.names, P.names, T.names, method = "LR", Bag = mybag, args.rpart = myarg, args.parallel = list(numWorkers = 1), nperm = 150)) ## End(Not run)
Pseudo R2 criterion for a mixture of population (susceptible and nonsusceptible populations)
PseudoR2.Cure(ygene, ydelai, yetat, strate, ordered = FALSE)
PseudoR2.Cure(ygene, ydelai, yetat, strate, ordered = FALSE)
ygene |
The main variable of interest |
ydelai |
The right censored delay until the event |
yetat |
The censoring indicator |
strate |
The varaiables acting on the nonsusceptible or cured population |
ordered |
A value of |
A pseudo R2 value lying between 0 and 1.
Cyprien Mbogning and Philippe Broet
Mbogning, C. and Broet, P. (2016). Bagging survival tree procedure for variable selection and prediction in the presence of nonsusceptible patients. BMC bioinformatics, 17(1), 1.
Bagg_Surv
Bagg_pred_Surv
improper_tree
data(burn) PseudoR2.Cure(ygene = burn$Z3, ydelai = burn$T3, yetat = burn$D3, strate = burn$Z2) PseudoR2.Cure(ygene = burn$Z2, ydelai = burn$T3, yetat = burn$D3, strate = burn$Z2)
data(burn) PseudoR2.Cure(ygene = burn$Z3, ydelai = burn$T3, yetat = burn$D3, strate = burn$Z2) PseudoR2.Cure(ygene = burn$Z2, ydelai = burn$T3, yetat = burn$D3, strate = burn$Z2)
Simple function using Rcpp
rcpp_hello_world()
rcpp_hello_world()
## Not run: rcpp_hello_world() ## End(Not run)
## Not run: rcpp_hello_world() ## End(Not run)
Coerces a given tree structure inheriting from rpart to binary covariates.
tree2indicators(fit)
tree2indicators(fit)
fit |
a tree structure inheriting to the rpart method |
a list of indicators defining the leaf nodes of the fitted tree from left to right
Cyprien Mbogning
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis) tree2indicators(fit)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis) tree2indicators(fit)