Title: | Variable Selection for Binary Data Using the EM Algorithm |
---|---|
Description: | Implements variable selection for high dimensional datasets with a binary response variable using the EM algorithm. Both probit and logit models are supported. Also included is a useful function to generate high dimensional data with correlated variables. |
Authors: | John Snyder [aut, cre] |
Maintainer: | John Snyder <[email protected]> |
License: | GPL-3 |
Version: | 0.1 |
Built: | 2024-11-16 02:47:09 UTC |
Source: | https://github.com/cran/BinaryEMVS |
Conducts EMVS analysis
BinomialEMVS(y, x, type = "probit", epsilon = 5e-04, v0s = ifelse(type == "probit", 0.025, 5), nu.1 = ifelse(type == "probit", 100, 1000), nu.gam = 1, lambda.var = 0.001, a = 1, b = ncol(x), beta.initial = NULL, sigma.initial = 1, theta.inital = 0.5, temp = 1, p = ncol(x), n = nrow(x), SDCD.length = 50)
BinomialEMVS(y, x, type = "probit", epsilon = 5e-04, v0s = ifelse(type == "probit", 0.025, 5), nu.1 = ifelse(type == "probit", 100, 1000), nu.gam = 1, lambda.var = 0.001, a = 1, b = ncol(x), beta.initial = NULL, sigma.initial = 1, theta.inital = 0.5, temp = 1, p = ncol(x), n = nrow(x), SDCD.length = 50)
y |
responses in 0-1 coding |
x |
X matrix |
type |
probit or logit model |
epsilon |
tuning parameter |
v0s |
tuning parameter, can be vector |
nu.1 |
tuning parameter |
nu.gam |
tuning parameter |
lambda.var |
tuning parameter |
a |
tuning parameter |
b |
tuning parameter |
beta.initial |
starting values |
sigma.initial |
starting value |
theta.inital |
startng value |
temp |
not sure |
p |
not sure |
n |
not sure |
SDCD.length |
not sure |
probs is posterior probabilities
#Generate data set.seed(1) n=25;p=500;pr=10;cor=.6 X=data.sim(n,p,pr,cor) #Randomly generate related beta coefficnets from U(-1,1) beta.Vec=rep(0,times=p) beta.Vec[1:pr]=runif(pr,-1,1) y=scale(X%*%beta.Vec+rnorm(n,0,sd=sqrt(3)),center=TRUE,scale=FALSE) prob=1/(1+exp(-y)) y.bin=t(t(ifelse(rbinom(n,1,prob)>0,1,0))) result.probit=BinomialEMVS(y=y.bin,x=X,type="probit") result.logit=BinomialEMVS(y=y.bin,x=X,type="logit") which(result.probit$posts>.5) which(result.logit$posts>.5)
#Generate data set.seed(1) n=25;p=500;pr=10;cor=.6 X=data.sim(n,p,pr,cor) #Randomly generate related beta coefficnets from U(-1,1) beta.Vec=rep(0,times=p) beta.Vec[1:pr]=runif(pr,-1,1) y=scale(X%*%beta.Vec+rnorm(n,0,sd=sqrt(3)),center=TRUE,scale=FALSE) prob=1/(1+exp(-y)) y.bin=t(t(ifelse(rbinom(n,1,prob)>0,1,0))) result.probit=BinomialEMVS(y=y.bin,x=X,type="probit") result.logit=BinomialEMVS(y=y.bin,x=X,type="logit") which(result.probit$posts>.5) which(result.logit$posts>.5)
Generates an high dimensional dataset with a subset of columns being related to the response, while controlling the maximum correlation between related and unrelated variables.
data.sim(n = 100, p = 1000, pr = 3, cor = 0.6)
data.sim(n = 100, p = 1000, pr = 3, cor = 0.6)
n |
sample size |
p |
total number of variables |
pr |
the number of variables related to the response |
cor |
the maximum correlation between related and unrelated variables |
Returns an nxp matrix with the first pr columns having maximum correlation cor with the remaining p-pr columns
data=data.sim(n=100,p=1000,pr=10,cor=.6) max(abs(cor(data))[abs(cor(data))<1])
data=data.sim(n=100,p=1000,pr=10,cor=.6) max(abs(cor(data))[abs(cor(data))<1])