Package 'multipleOutcomes'

Title: Asymptotic Covariance Matrix of Regression Models for Multiple Outcomes
Description: Regression models can be fitted for multiple outcomes simultaneously. This package computes estimates of parameters across fitted models and returns the matrix of asymptotic covariance. Various applications of this package, including PATED (Prognostic Variables Assisted Treatment Effect Detection), multiple comparison adjustment, are illustrated.
Authors: Han Zhang [aut, cre]
Maintainer: Han Zhang <[email protected]>
License: MIT + file LICENSE
Version: 0.8
Built: 2025-02-18 05:39:57 UTC
Source: https://github.com/zhangh12/multipleoutcomes

Help Index


ACTG 320 Clinical Trial Dataset

Description

actg dataset from Hosmer et al.

Format

A data frame

id

Identification Code

time

Time to AIDS diagnosis or death (days).

censor

Event indicator. 1 = AIDS defining diagnosis, 0 = Otherwise.

time_d

Time to death (days)

censor_d

Event indicator for death (only). 1 = Death, 0 = Otherwise.

tx

Treatment indicator. 1 = Treatment includes IDV, 0 = Control group.

txgrp

Treatment group indicator. 1 = ZDV + 3TC. 2 = ZDV + 3TC + IDV. 3 = d4T + 3TC. 4 = d4T + 3TC + IDV.

strat2

CD4 stratum at screening. 0 = CD4 <= 50. 1 = CD4 > 50.

sex

0 = Male. 1 = Female.

raceth

Race/Ethnicity. 1 = White Non-Hispanic. 2 = Black Non-Hispanic. 3 = Hispanic. 4 = Asian, Pacific Islander. 5 = American Indian, Alaskan Native. 6 = Other/unknown.

ivdrug

IV drug use history. 1 = Never. 2 = Currently. 3 = Previously.

hemophil

Hemophiliac. 1 = Yes. 0 = No.

karnof

Karnofsky Performance Scale. 100 = Normal; no complaint no evidence of disease. 90 = Normal activity possible; minor signs/symptoms of disease. 80 = Normal activity with effort; some signs/symptoms of disease. 70 = Cares for self; normal activity/active work not possible.

cd4

Baseline CD4 count (Cells/Milliliter).

priorzdv

Months of prior ZDV use (months).

age

Age at Enrollment (years).

Source

ftp://ftp.wiley.com/public/sci_tech_med/survival

References

Hosmer, D.W. and Lemeshow, S. and May, S. (2008) Applied Survival Analysis: Regression Modeling of Time to Event Data: Second Edition, John Wiley and Sons Inc., New York, NY

Examples

data(actg)

Compute asymptotic variance-covariance matrix of parameters in given models It is used for models where bootstrap is not needed

Description

Compute asymptotic variance-covariance matrix of parameters in given models It is used for models where bootstrap is not needed

Usage

asymptoticMultipleOutcomes(
  ...,
  data,
  family,
  data_index = NULL,
  score_epsilon = 1e-06
)

Compute bootstrapped variance-covariance matrix of parameters in given models It is used when at least one of the specified model needs bootstrap, for example, Kaplan-Merier estimate for probability of survival, or quantiles are used for prognostic variables.

Description

Compute bootstrapped variance-covariance matrix of parameters in given models It is used when at least one of the specified model needs bootstrap, for example, Kaplan-Merier estimate for probability of survival, or quantiles are used for prognostic variables.

Usage

bootstrapMultipleOutcomes(
  ...,
  data,
  data_index = NULL,
  nboot = 10,
  compute_cov = TRUE,
  seed = NULL
)

Process inputs of multipleOutcomes when bootstrap will be used to estimate variance-covariance matrix

Description

Process inputs of multipleOutcomes when bootstrap will be used to estimate variance-covariance matrix

Usage

checkBootstrapInput(..., data, data_index)

Process inputs of multipleOutcomes when asymptotic properties are used to estimate variance-covariance matrix

Description

Process inputs of multipleOutcomes when asymptotic properties are used to estimate variance-covariance matrix

Usage

checkDefaultInput(..., family, data, data_index)

Extract Model Coefficients

Description

coef is a generic function.

Usage

## S3 method for class 'multipleOutcomes'
coef(object, model_index = NULL, ...)

Arguments

object

an object returned by multipleOutcomes().

model_index

NULL if displaying coefficients of all fitted models; otherwise, an integer indicating the fitted model.

...

for debugging only

Value

a vector of coefficient estimates


Generate two curves of survival probability with pointwise 95% confidence interval:
  1. PATED adjusted KM curve

  2. Conventional KM curve

Description

Generate two curves of survival probability with pointwise 95% confidence interval:

  1. PATED adjusted KM curve

  2. Conventional KM curve

Usage

comparePointwiseConfidenceIntervalWidth(
  pated_res,
  km_res,
  transform = "identity"
)

return estimate of log HR from coxph model to be used when bootstrap is needed.

Description

return estimate of log HR from coxph model to be used when bootstrap is needed.

Usage

coxphMO(formula, data = NULL, ties = c("efron", "breslow", "exact"))

Create curve of survival probability based on conventional KM method, or PATED adjusted KM estimates. Refer to km_res or pated_res in pated about the format of input. Transformations are supported when computing confidence intervals. Refer to https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_lifetest_details08.htm or the LIFETEST proc in SAS for more details of transformation.

Description

Create curve of survival probability based on conventional KM method, or PATED adjusted KM estimates. Refer to km_res or pated_res in pated about the format of input. Transformations are supported when computing confidence intervals. Refer to https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_lifetest_details08.htm or the LIFETEST proc in SAS for more details of transformation.

Usage

createKaplanMeierCurve(input, title = "", transform = "identity")

Figure out the time points with at least one event. This is used to generate KM-like curves.

Description

Figure out the time points with at least one event. This is used to generate KM-like curves.

Usage

extractKaplanMeierTimes(coef, sort = TRUE)

return estimates from GEE model to be used when bootstrap is needed.

Description

return estimates from GEE model to be used when bootstrap is needed.

Usage

geeMO(formula, id, data = NULL, family, corstr)

g(S(t)), a transformed estimate of survival probability. See https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_lifetest_details08.htm

Description

g(S(t)), a transformed estimate of survival probability. See https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_lifetest_details08.htm

Usage

gFunction(conf.type = c("log", "log-log", "plain", "logit"))

Inversed function of g(S(t)). See https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_lifetest_details08.htm

Description

Inversed function of g(S(t)). See https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_lifetest_details08.htm

Usage

gInverseFunction(conf.type = c("log", "log-log", "plain", "logit"))

return estimates from GLM model to be used when bootstrap is needed.

Description

return estimates from GLM model to be used when bootstrap is needed.

Usage

glmMO(formula, family, data = NULL)

df is of length that equals to the number of specified models dfi is the number of parameters in model i This function returns index of parameters of model i in the vector of parameters of all models

Description

df is of length that equals to the number of specified models dfi is the number of parameters in model i This function returns index of parameters of model i in the vector of parameters of all models

Usage

IDMapping(df, i, var_name = NULL)

Fitting Regression Models for Multiple Outcomes and Returning the Matrix of Covariance

Description

multipleOutcomes can fit different types of models for multiple outcomes simultaneously and return model parameters and variance-covariance matrix for further analysis.

Usage

multipleOutcomes(
  ...,
  data,
  family = NULL,
  data_index = NULL,
  nboot = 0,
  compute_cov = TRUE,
  seed = NULL,
  score_epsilon = 1e-06
)

Arguments

...

formulas of models to be fitted, or moment functions for gmm.

data

a data frame if all models are fitted on the same dataset; otherwise a list of data frames for fitting models in .... Note that a dataset can be used to fit multiple models, thus, length(data) is unnecessary to be equal to the number of models in .... The row names in a data frame are treated as sample IDs. Consequently, for any two records in different data frames that correspond to the same sample, their row names should be consistent.

family

a character vector of families to be used in the models. Currently only gaussian, binomial, coxph, logrank, gmm and gee are supported. To analyze longitudinal data and repeated measurements, family should be something like gee+id+family+corstr, where id is the column name defining cluster (i.e. one patient per cluster). The covariance estimate may be inaccurate if number of clusters is too small. The supported families for gee include gaussian, binomial, poisson, Gamma, and quasi. The supported correlation structures include independence, stat_M_dep, non_stat_M_dep, exchangeable, AR-M and unstructured. So a valid family string is like gee+user_id+binomial+exchangeable. family can be of length 1 if all models are fitted in the same family; otherwise family should be specified for each of the models.

data_index

NULL if data is a data frame; otherwise, a vector in integer specifying mapping a model in ... to a data frame in data (a list).

nboot

non-zero integer if bootstrap is adopted. By default 0.

score_epsilon

whatever.

Value

It returns an object of class "multipleOutcomes", which is a list containing the following components:

coefficients an unnamed vector of coefficients of all fitted models. Use id_map for variable mapping.
mcov a unnamed matrix of covariance of coefficients. Use id_map for variable mapping.
id_map a list mapping the elements in coefficients and mcov to variable names.
n_shared_sample_sizes a matrix of shared sample sizes between datasets being used to fit the models.
call the matched call.

Examples

## More examples can be found in the vignettes.
library(mvtnorm)
genData <- function(seed = NULL){

  set.seed(seed)
  n <- 400
  sigma <- matrix(c(1, .6, .6, 1), 2)
  x <- rmvnorm(n, sigma = sigma)
  gam <- c(.1, -.2)
  z <- rbinom(n, 1, plogis(1-1/(1+exp(-.5+x%*%gam+.1*rnorm(n)))))

  bet <- c(-.2,.2)
  #y <- rbinom(n, 1, plogis(1-1/(1+exp(-.5+x%*%bet + .2*z-.3*rnorm(n)))))
  y <- -.5+x%*%bet + .2*z-.3*rnorm(n)

  data.frame(y = y, z = z, x1 = x[, 1], x2 = x[, 2])

}

dat <- genData(123456)
dat1 <- head(dat,200)
dat2 <- tail(dat,200)
## fitting four models simultaneously.
fit <-
  multipleOutcomes(
    y ~ z + x1 - 1,
    z ~ x1 + x2,
    z ~ x1 - 1,
    y ~ x2,
    ## z can be fitted with a linear or logistic regression
    family = c('gaussian', 'binomial', 'gaussian','gaussian'),
    data = list(dat1, dat2),
    ## each dataset is used to fit two models
    data_index = c(1, 1, 2, 2)
  )

  ## unnamed coefficients of all model parameters
  coef(fit)

  ## named coefficients of a specific model
  coef(fit, 2)

  ## unnamed covariance matrix of all model parameters
  vcov(fit)

  ## named covariance matrix of a specific model
  vcov(fit, 1)

  ## summary of all parameter estimates
  summary(fit)

  ## summary of parameters in a specific model
  summary(fit, 4)

Prognostic Variables Assisted Treatment Effect Detection

Description

pated is a wrapper function of multipleOutcomes for testing treatment effect in randomized clinical trials. It assumes that prognostic variables are fully randomized. This assumption can help enhancing statistical power of conventional approaches in detecting the treatment effect. Specifically, the sensitivity of the conventional models specified in ... are improved by pated.

Usage

pated(
  ...,
  data,
  family = NULL,
  data_index = NULL,
  nboot = 0,
  compute_cov = TRUE,
  seed = NULL,
  transform = "identity"
)

Arguments

...

formulas of models to be fitted, or moment functions for gmm.

data

a data frame if all models are fitted on the same dataset; otherwise a list of data frames for fitting models in .... Note that a dataset can be used to fit multiple models, thus, length(data) is unnecessary to be equal to the number of models in .... The row names in a data frame are treated as sample IDs. Consequently, for any two records in different data frames that correspond to the same sample, their row names should be consistent.

family

a character vector of families to be used in the models. All families supported by multipleOutcomes are also supported by pated. family can be of length 1 if all models are fitted in the same family; otherwise family should be specified for each of the models in ....

data_index

NULL if data is a data frame; otherwise, a vector in integer specifying mapping a model in ... to a data frame in data (a list).

Value

a data frame of testing results.

Examples

## More examples can be found in the vignettes.
library(survival)
library(mvtnorm)
library(tidyr)
genData <- function(seed = NULL){

  set.seed(seed)
  n <- 200
  sigma <- matrix(c(1, .6, .6, 1), 2)
  x <- rmvnorm(n, sigma = sigma)
  z1 <- rbinom(n, 1, .6)
  z2 <- rnorm(n)
  gam <- c(.1, -.2)
  trt <- rbinom(n, 1, .5)

  bet <- c(-.2,.2)
  y <- -.5+x %*% bet + z1 * .3 - z2 * .1 + .1 * trt-.1 * rnorm(n)
  death <- rbinom(n, 1, .8)
  id <- 1:n
  data.frame(
    y = y, trt = trt, 
    z1 = z1, z2 = z2, 
    x1 = x[, 1], x2 = x[, 2], 
    death, id)

}

dat1 <- genData(seed = 31415926)

## create a dataset with repeated measurements x
dat2 <- dat1 %>% pivot_longer(c(x1, x2), names_to='tmp', values_to='x') %>% 
dplyr::select(x, trt, id) %>% as.data.frame()

fit <- 
  pated(
    Surv(time=y, event=death) ~ trt,
    z1 ~ trt, 
    z2 ~ trt, 
    x ~ trt, 
    family=c('logrank', 'binomial', 'gaussian', 'gee+id+gaussian'), 
    data=list(dat1, dat2), data_index = c(1, 1, 1, 2))

fit

Title Summarize an Analysis of Multiple Outcomes.

Description

Summarize an analysis of multiple outcomes.

Usage

## S3 method for class 'summary.multipleOutcomes'
print(x, ...)

Arguments

x

an object returned by multipleOutcomes().

...

for debugging only.

Value

an invisible object.

Examples

## no example

Return difference in quantile between arms formula can be endpoint ~ trt

Description

Return difference in quantile between arms formula can be endpoint ~ trt

Usage

quantileMO(formula, data = NULL, probs = c(0.25, 0.5, 0.75))

Generate bootstrap dataset. When missing data presents, dataset is split into groups. Patients in the same group have missing on the same covariates. Bootstrap is carried out in each of the groups

Description

Generate bootstrap dataset. When missing data presents, dataset is split into groups. Patients in the same group have missing on the same covariates. Bootstrap is carried out in each of the groups

Usage

sampleWithReplacement(data)

Generating Data for Simulation and Testing

Description

simulateMoData generates data for simulation and testing purposes.

Usage

simulateMoData(n = 500, hr = 0.8, seed = NULL)

Arguments

n

an integer for total sample size of a randomized control trial of two arms.

hr

hazard ratio of treatment.

seed

random seed. By default NULL for no seed being specified.


Object Summaries

Description

summary method for class multipleOutcomes.

Usage

## S3 method for class 'multipleOutcomes'
summary(object, model_index = NULL, ...)

Arguments

object

an object returned by multipleOutcomes().

model_index

NULL if displaying summary of all fitted models; otherwise, an integer indicating the fitted model.

...

for debugging only

Value

a list


Calculate Variance-Covariance Matrix for a Fitted Model Object

Description

Returns the variance-covariance matrix of the main parameters of fitted model objects. The "main" parameters of models correspond to those returned by coef.

Usage

## S3 method for class 'multipleOutcomes'
vcov(object, model_index = NULL, ...)

Arguments

object

an object returned by multipleOutcomes().

model_index

NULL if displaying covariance matrix of all fitted models; otherwise, an integer indicating the fitted model.

...

for debugging only

Value

a matrix of covariance of all estimates