R2-D2

Description

The R2-D2 prior (Zhang et al. 2022) is a global-local shrinkage prior. This has been expanded for generalised linear models (Yanchenko, Bondell, and Reich 2024).

Definition

For a linear model:

\[ \begin{align} y_i &\sim \text{normal} (\mu_i,\sigma^2) \\ \mu_i &= \alpha + \beta \cdot x_i \end{align} \]

The prior is:

\[ \begin{align} \beta_i &\sim \text{normal}\left(0, \frac{\sigma^2}{\sigma_{y_{t}\mid \theta}^2}\tau^2\psi_i\right) \\ \tau^2 &= \frac{R^2}{1 - R^2} \\ R^2 &\sim \text{beta}(\mu_{R^2},\phi_{R^2}) \\ \psi &\sim \text{Dirichlet}(\xi_1,\dotsc,\xi_p) \\ \sigma^2 &\sim p(\sigma^2) \\ \alpha &\sim p(\alpha) \end{align} \]

Things to specify

Priors on \(\alpha\) and \(\sigma\). Hyperparameters \(\xi\), \(\mu_{R^2}\), \(\phi_{R^2}\).

Stan code

functions {
  /* Efficient computation of the R2D2 prior
   * Args:
   *   z: standardized population-level coefficients
   *   phi: local weight parameters
   *   tau2: global scale parameter
   * Returns:
   *   population-level coefficients following the R2D2 prior
   */
  vector R2D2(vector z, vector phi, real tau2) {
    return z .* sqrt(phi * tau2);
  }
}

data {
  int<lower=0> N;
  int<lower=0> K;
  matrix[N, K] X;
  vector[N] Y;
  int prior_only;  // should the likelihood be ignored?
  // concentration vector of the D2 prior
  vector<lower=0>[K-1] R2D2_cons_D2;
  // data for the R2D2 prior
  real<lower=0> R2D2_mean_R2;  // mean of the R2 prior
  real<lower=0> R2D2_prec_R2;  // precision of the R2 prior
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc; // centered version of X without an intercept
  vector[Kc] means_X; // column means of X before centering
  real sd_Y = sd(Y);
  for (i in 2 : K) {
    means_X[i - 1] = mean(X[ : , i]);
    Xc[ : , i - 1] = X[ : , i] - means_X[i - 1];
  }
}
parameters {
  // local parameters for the R2D2 prior
  vector[Kc] zb;
  simplex[Kc] R2D2_phi;
  // R2D2 shrinkage parameters
  real<lower=0,upper=1> R2D2_R2;  // R2 parameter
  real<lower=0> sigma;  // dispersion parameter
  real a_c;
}

transformed parameters {
  vector[Kc] b;  // population-level effects
  real R2D2_tau2;  // global R2D2 scale parameter
  array[Kc+4] real lprior;
  R2D2_tau2 = sigma^2 * R2D2_R2 / (1 - R2D2_R2);
  // compute actual regression coefficients
  b = R2D2(zb, R2D2_phi, R2D2_tau2);
  lprior[1] = student_t_lpdf(a_c | 4, 0, sd_Y);
  for (k in 1:Kc) {
    lprior[k+1] = std_normal_lpdf(zb[k]);
  }
  lprior[Kc+2] = beta_lpdf(R2D2_R2 | R2D2_mean_R2 * R2D2_prec_R2, (1 - R2D2_mean_R2) * R2D2_prec_R2);
  lprior[Kc+3] = dirichlet_lpdf(R2D2_phi | R2D2_cons_D2);
  lprior[Kc+4] = student_t_lpdf(sigma | 3, 0, sd_Y);
  
}

model {
  // likelihood including constants
  if (!prior_only) {
    target += normal_id_glm_lpdf(Y | Xc, a_c, b, sigma);
  }
  // priors including constants
  target += sum(lprior);
}

References

Yanchenko, Eric, Howard D. Bondell, and Brian J. Reich. 2024. “The R2D2 Prior for Generalized Linear Mixed Models.” arXiv. https://doi.org/10.48550/arXiv.2111.10718.
Zhang, Yan Dora, Brian P. Naughton, Howard D. Bondell, and Brian J. Reich. 2022. “Bayesian Regression Using a Prior on the Model Fit: The R2-D2 Shrinkage Prior.” Journal of the American Statistical Association 117 (538): 862–74. https://doi.org/10.1080/01621459.2020.1825449.