Debiased / Desparsified Lasso¶
At a glance
Family: high-dim-inference · Regime: high-dim (\(p\gg n\)) · Penalty: lasso (base) · Output: point+inference · Links: identity, logit, log · Status: draft · Refs: vandegeer2014, Zhang_delasso_2014, javanmard2014confidence, hdi2015, 10.1111/biom.13587
Setting & assumptions¶
- High-dimensional regime: \(p\) comparable to or \(\gg n\), with sparsity \(s=\lVert\beta^\star\rVert_0\ll p\). The motivating goal is inference on individual coordinates \(\beta_j^\star\) — confidence intervals and tests — not just point estimation.
- Linear/Gaussian case: \(y=X\beta^\star+\varepsilon\), identity link, \(\varepsilon\) mean-zero with variance \(\sigma^2\) (unknown, estimated). GLM extension covers logit/log links via the Fisher-information geometry (below).
- The lasso \(\hat\beta\) is consistent in \(\ell_1\) but biased and non-Gaussian (shrinkage toward \(0\), no tractable limit law). Debiasing removes the leading bias so a Gaussian limit is recovered.
- Design conditions: a restricted-eigenvalue / compatibility condition on \(\hat\Sigma=X^\top X/n\), and a row-sparsity condition on the precision matrix \(\Theta^\star=\Sigma^{-1}\) so that \(\hat\Theta\) can be estimated. Sparsity must satisfy \(s=o\!\big(\sqrt{n}/\log p\big)\) for the remainder term to be asymptotically negligible.
Estimator / objective¶
Start from the lasso base estimator \(\hat\beta=\arg\min_\beta \tfrac1{2n}\lVert y-X\beta\rVert_2^2+\lambda\lVert\beta\rVert_1\). Its KKT/score residual is biased; the debiased (desparsified) estimator adds a single score-correction step using an approximate inverse \(\hat\Theta\) of \(\hat\Sigma=X^\top X/n\):
This is an estimating-equation correction rather than a new penalized objective: it inverts the lasso KKT conditions so that the \(\ell_1\) shrinkage on the target coordinate is removed ("desparsified"). Plugging \(y=X\beta^\star+\varepsilon\) gives the bias decomposition
The Gaussian term satisfies \(Z_j\mid X\sim N\!\big(0,\sigma^2[\hat\Theta\hat\Sigma\hat\Theta^\top]_{jj}\big)\). The remainder is bounded coordinatewise by \(\lVert\Delta\rVert_\infty\le \sqrt{n}\,\lambda_{\text{node}}\,\lVert\hat\beta-\beta^\star\rVert_1 =O_P\!\big(s\log p/\sqrt n\big)\), which \(\to 0\) under \(s=o(\sqrt n/\log p)\). Hence each coordinate is asymptotically normal and \(\sqrt n\)-consistent:
yielding the \((1-\alpha)\) confidence interval \(\hat b_j\pm z_{1-\alpha/2}\,\hat\sigma\sqrt{[\hat\Theta\hat\Sigma\hat\Theta^\top]_{jj}/n}\).
Constructing \(\hat\Theta\) — nodewise lasso¶
van de Geer et al. (2014) build \(\hat\Theta\) row by row via nodewise lasso: regress each column \(X_{\cdot j}\) on the remaining columns \(X_{\cdot\text{-}j}\),
with residual \(r_j=X_{\cdot j}-X_{\cdot\text{-}j}\hat\gamma_j\) and \(\hat\tau_j^2=r_j^\top X_{\cdot j}/n\). Row \(j\) of \(\hat\Theta\) is \(\hat\Theta_{j\cdot}=\hat\tau_j^{-2}\,(-\hat\gamma_{j,1},\dots,1,\dots,-\hat\gamma_{j,p-1})\) (the \(1\) in position \(j\)). Equivalently (Javanmard–Montanari) \(\hat\Theta\)'s rows solve a constrained optimization minimizing \(m^\top\hat\Sigma m\) subject to \(\lVert\hat\Sigma m-e_j\rVert_\infty\le\mu\), directly controlling the bias term \(\lVert I-\hat\Theta\hat\Sigma\rVert_\infty\le\mu\).
Variance estimation¶
The noise level \(\sigma^2\) is estimated from the lasso residuals, e.g. the scaled-lasso /
degrees-of-freedom corrected estimator
\(\hat\sigma^2=\lVert y-X\hat\beta\rVert_2^2/\big(n-\lVert\hat\beta\rVert_0\big)\) or via
cross-validation; hdi (hdi2015) implements these defaults.
GLM extension (logit / log links)¶
For a GLM, replace the Gaussian residual by the score and \(\hat\Sigma\) by the weighted (Fisher-information) Gram matrix \(\hat\Sigma_W=X^\top \hat W X/n\) with \(\hat W=\operatorname{diag}\big(V(\hat\mu_i)\big)\) evaluated at the penalized MLE \(\hat\beta\). The debiased estimator becomes
with asymptotic variance \([\hat\Theta\,\hat\Sigma_W\,\hat\Theta^\top]_{jj}/n\); the nodewise
regressions are run on the \(\sqrt{\hat W}\)-weighted design. Xia, Nan & Li (10.1111/biom.13587)
develop this weighted/Fisher-information correction for GLMs; guo2021inference and
ma2021global give related corrections and (simultaneous) tests for high-dimensional logistic
regression.
Algorithm¶
Input: X (n×p), y (n), penalties λ (lasso), {λ_j} (nodewise), link g
1. Fit base lasso: β̂ = argmin (1/2n)‖y - Xβ‖² + λ‖β‖₁ # GLM: penalized IRLS
2. Build Θ̂ (nodewise lasso), for j = 1..p:
γ̂_j = argmin (1/2n)‖X[:,j] - X[:,-j]γ‖² + λ_j‖γ‖₁ # GLM: √Ŵ-weighted
r_j = X[:,j] - X[:,-j] γ̂_j
τ̂_j² = r_jᵀ X[:,j] / n
Θ̂[j,:] = (1/τ̂_j²) · (−γ̂_j with 1 inserted at position j)
3. Debias: b̂ = β̂ + (1/n) Θ̂ Xᵀ (y - Xβ̂) # GLM: (y - μ̂), Ŵ-weighted Gram
4. Variance: σ̂² from lasso residuals; Ω̂ = Θ̂ Σ̂ Θ̂ᵀ
5. Inference: for each j
se_j = σ̂ · sqrt( Ω̂_jj / n )
CI_j = b̂_j ± z_{1-α/2} · se_j
z-stat = b̂_j / se_j → N(0,1) under H₀: β_j = 0
Return b̂, {se_j}, {CI_j}
The Javanmard–Montanari variant replaces step 2 with the per-row constrained QP \(\min_m m^\top\hat\Sigma m\) s.t. \(\lVert\hat\Sigma m-e_j\rVert_\infty\le\mu\).
Hyperparameters & configuration¶
| Knob | Default | Notes |
|---|---|---|
| \(\lambda\) (base lasso) | CV / \(\sqrt{\log p/n}\) | controls \(\hat\beta\); bias of \(\hat b\) is first-order insensitive to it |
| \(\lambda_j\) (nodewise) | CV per column | governs \(\hat\Theta\) quality and the remainder \(\lVert I-\hat\Theta\hat\Sigma\rVert_\infty\) |
| \(\mu\) (JM variant) | \(\asymp\sqrt{\log p/n}\) | bias bound in the constrained-QP construction |
| \(\hat\sigma^2\) | scaled lasso / df-corrected | noise level for Gaussian SEs |
| link / weights | identity | logit, log via Fisher weights \(\hat W\) (GLM extension) |
| \(\alpha\) | \(0.05\) | CI level; Bonferroni/holm or ma2021global for simultaneity |
Mapping to framework¶
- Input: \(X\in\mathbb{R}^{n\times p}\), \(y\in\mathbb{R}^n\), link \(g\), penalties \(\lambda,\{\lambda_j\}\).
- Output: debiased point estimate \(\hat b\) plus inference — standard errors \(\{se_j\}\),
\(z\)-statistics/\(p\)-values, and CIs \(\{CI_j\}\) (hence
output: point+inference). - Links: identity (Gaussian); logit, log via the weighted Fisher-information correction.
- Preprocessing: standardize columns of \(X\); center \(y\) (Gaussian) or fit unpenalized intercept (GLM). Coefficients/SEs returned on the requested scale.
Complexity¶
- Base lasso: as in Lasso via Coordinate Descent, \(O(npL)\) over a path.
- Nodewise \(\hat\Theta\): \(p\) separate lasso regressions, each \(O(np)\) per cycle ⇒ dominant cost \(O(np^2)\) (embarrassingly parallel over columns).
- Debias + variance: \(O(np)\) for the score correction; forming the needed diagonal of \(\hat\Theta\hat\Sigma\hat\Theta^\top\) is \(O(np)\) per coordinate.
- Memory \(O(np)\) (rows of \(\hat\Theta\) can be streamed/discarded after extracting \(se_j\)).
Statistical guarantees¶
- Asymptotic normality: under compatibility + precision-row-sparsity and
\(s=o(\sqrt n/\log p)\), \(\sqrt n(\hat b_j-\beta_j^\star)\xrightarrow{d}N(0,\sigma^2\Omega_{jj})\)
with \(\Omega=\Theta\Sigma\Theta^\top\) (
vandegeer2014,Zhang_delasso_2014,javanmard2014confidence). - Coverage / honesty: the resulting CIs achieve asymptotically nominal coverage uniformly over the sparse parameter space; the remainder term is uniformly negligible.
- Efficiency: the asymptotic variance attains the semiparametric efficiency bound under the
stated conditions (
jankova2018semiparametric). - GLM: analogous normality with the Fisher-information variance for logit/log links
(
10.1111/biom.13587,guo2021inference);ma2021globalprovides simultaneous/global tests.
Variants & related¶
- Lasso via Coordinate Descent — the base estimator that is debiased here.
- Decorrelated Score (Ning–Liu) — score-based inference framework with the same orthogonalization idea, stated directly for general GLMs.
- Nodewise lasso (Meinshausen–Bühlmann) — the precision-matrix construction reused for \(\hat\Theta\).
- Scaled / square-root lasso — pivotal noise-level estimation feeding the SEs.
References¶
- van de Geer, Bühlmann, Ritov & Dezeure (2014), On asymptotically optimal confidence regions
and tests for high-dimensional models (
vandegeer2014) — desparsified lasso, nodewise \(\hat\Theta\). - Zhang & Zhang (2014), Confidence intervals for low-dimensional parameters in high-dimensional
linear models (
Zhang_delasso_2014) — the low-dimensional projection / debiasing idea. - Javanmard & Montanari (2014), Confidence intervals and hypothesis testing for high-dimensional
regression (
javanmard2014confidence) — constrained-QP construction of \(\hat\Theta\). - Dezeure, Bühlmann, Meier & Meinshausen (2015), High-dimensional inference: confidence
intervals, p-values and R-software hdi (
hdi2015) — implementation and variance defaults. - Xia, Nan & Li (2021), Debiased lasso for generalized linear models (
10.1111/biom.13587) — Fisher-information correction for GLMs. - Guo, Rakshit, Herman & Chen (2021), Inference for the case probability in high-dimensional
logistic regression (
guo2021inference). - Ma, Cai & Li (2021), Global and simultaneous hypothesis testing for high-dimensional logistic
regression models (
ma2021global).