Decorrelated Score (Ning–Liu)¶
At a glance
Family: high-dim-inference · Regime: high-dim (\(p\gg n\)) · Penalty: lasso (base) · Output: point+inference · Links: any · Status: draft · Refs: 10.1214/16-AOS1448, vandegeer2014, javanmard2014confidence
Setting & assumptions¶
- High-dimensional model with a low-dimensional target: partition the coefficient \(\beta=(\theta,\gamma)\) where \(\theta\in\mathbb{R}\) (or low-dimensional) is the parameter of interest and \(\gamma\in\mathbb{R}^{p-1}\) is a high-dimensional nuisance. The aim is a valid test of \(H_0:\theta=\theta_0\) and a CI for \(\theta\).
- Any twice-differentiable likelihood in the exponential family:
the framework is stated generically through the loss \(\mathcal L(\beta)\), its score
\(\nabla\mathcal L\) and Fisher information \(I\), so it applies across GLMs (linear, logistic,
Poisson, …) — hence
link_support: [any]. - Sparsity \(\lVert\gamma^\star\rVert_0\ll p\) and a compatibility/restricted-eigenvalue condition, so a penalized (lasso) estimator \(\hat\gamma\) is \(\ell_1\)-consistent. The nuisance score block \(I_{\gamma\gamma}\) has a sparse inverse direction, enabling the decorrelation step.
- Key difficulty: the ordinary score \(\nabla_\theta\mathcal L\) is sensitive to the estimation error in \(\hat\gamma\) (its bias does not vanish at \(\sqrt n\) rate). The decorrelated score removes this first-order sensitivity.
Estimator / objective¶
Write the partitioned information at \(\beta^\star\), \(I=\begin{psmallmatrix}I_{\theta\theta}&I_{\theta\gamma}\\ I_{\gamma\theta}&I_{\gamma\gamma}\end{psmallmatrix}=\nabla^2\mathcal L\). The decorrelated (orthogonalized) score projects the target score orthogonal to the nuisance directions:
The weight \(w\) is exactly the projection coefficient of the target score onto the nuisance score space, so that \(\mathbb{E}\big[S\cdot\nabla_\gamma\mathcal L\big]=0\): \(S\) is orthogonal to the nuisance, making it insensitive to estimation error in \(\gamma\). The information for \(\theta\) after removing the nuisance is the partial (efficient) information
In practice \(w\) is estimated by a sparse / Dantzig-type regression of the target-score components on the nuisance-score components (equivalently a penalized solve of \(I_{\gamma\gamma}\,w=I_{\gamma\theta}\)):
with \(\hat I=\nabla^2\mathcal L(\hat\theta_0,\hat\gamma)\) evaluated at the penalized estimator. The plug-in decorrelated score statistic for \(H_0:\theta=\theta_0\) is then
and \(U_n^2\xrightarrow{d}\chi^2_1\) (more generally \(\chi^2_d\) for a \(d\)-dimensional target). A one-step / point estimator is recovered by a Newton correction along the decorrelated direction,
Inverting for CIs. Collect all \(\theta_0\) not rejected at level \(\alpha\):
Algorithm¶
Input: X, y, loss L (GLM neg-loglik), target index θ, null θ₀, penalties λ, μ
1. Penalized nuisance fit:
γ̂ = argmin_γ L(θ₀, γ) + λ‖γ‖₁ # lasso/penalized MLE with θ fixed at θ₀
2. Plug-in information at (θ₀, γ̂):
Î = ∇²L(θ₀, γ̂) → blocks Î_θθ, Î_θγ, Î_γγ
3. Decorrelation weight (sparse / Dantzig solve):
ŵ = argmin ‖w‖₁ s.t. ‖Î_γγ w − Î_γθ‖_∞ ≤ μ
4. Decorrelated score and partial information:
Ŝ = ∇_θ L(θ₀, γ̂) − ŵᵀ ∇_γ L(θ₀, γ̂)
Î_{θ|γ} = Î_θθ − ŵᵀ Î_γθ
5. Test statistic:
U = √n · Ŝ / sqrt(Î_{θ|γ}) # N(0,1) under H₀; U² ~ χ²₁
6. One-step estimate & CI:
θ̂ = θ₀ − Ŝ / Î_{θ|γ}
CI = θ̂ ± z_{1-α/2} / sqrt(n · Î_{θ|γ})
Return θ̂, U (p-value), CI
Hyperparameters & configuration¶
| Knob | Default | Notes |
|---|---|---|
| \(\lambda\) (nuisance lasso) | CV / \(\sqrt{\log p/n}\) | regularizes \(\hat\gamma\); the score is first-order insensitive to its error |
| \(\mu\) (Dantzig tol) | \(\asymp\sqrt{\log p/n}\) | controls bias of the decorrelation weight \(\hat w\) |
| loss / link | any GLM | identity, logit, log, … via \(\nabla\mathcal L,\nabla^2\mathcal L\) |
| target dim \(d\) | \(1\) | \(\chi^2_d\) null for multi-dimensional \(\theta\) |
| \(\alpha\) | \(0.05\) | test level / CI coverage |
Mapping to framework¶
- Input: \(X\in\mathbb{R}^{n\times p}\), \(y\in\mathbb{R}^n\), link \(g\), target coordinate(s) \(\theta\), null value \(\theta_0\), penalties \(\lambda,\mu\).
- Output: point estimate \(\hat\theta\) plus inference — the statistic \(U_n\)
(\(p\)-value) and CI \(\mathcal C_{1-\alpha}\) (hence
output: point+inference). - Links: any — the construction only uses the loss gradient/Hessian \(\nabla\mathcal L,\ \nabla^2\mathcal L=I\).
- Preprocessing: standardize \(X\); unpenalized intercept as usual. The nuisance partition is fixed by the user (which coordinate is being tested).
Complexity¶
- Step 1 (penalized nuisance fit): one lasso/penalized-IRLS solve, \(O(np)\) per cycle.
- Step 3 (decorrelation weight): one sparse/Dantzig regression of dimension \(p-1\), \(O(np)\) per cycle (or an LP for the Dantzig form).
- Steps 4–6: \(O(np)\) for the score and partial information; testing a single target is cheap.
- Testing many coordinates ⇒ repeat the decorrelation per target (parallelizable), dominant cost \(O(np^2)\) overall; memory \(O(np)\).
Statistical guarantees¶
- Asymptotic null distribution: under compatibility + sparsity
(\(s\log p/\sqrt n\to 0\)) and a valid sparse \(\hat w\),
\(U_n\xrightarrow{d}N(0,1)\) under \(H_0\) and \(U_n^2\xrightarrow{d}\chi^2_d\)
(
10.1214/16-AOS1448). - Estimation / normality: the one-step estimator is \(\sqrt n\)-consistent and asymptotically
normal with variance \(I_{\theta\mid\gamma}^{-1}\), the semiparametric efficiency bound for
\(\theta\) in the presence of the nuisance (
jankova2018semiparametric). - Honest coverage: inverting the test gives CIs with asymptotically nominal coverage, uniformly over the sparse nuisance space; validity is robust to model misspecification of the nuisance and to moderate nuisance estimation error.
- The construction generalizes the desparsified-lasso normal limit (
vandegeer2014,javanmard2014confidence) from linear models to general (penalized) likelihoods.
Variants & related¶
- Debiased / Desparsified Lasso — the linear-model special case; its \(\hat\Theta X^\top\) correction coincides with decorrelation under Gaussian loss.
- Lasso via Coordinate Descent — supplies the penalized nuisance estimate \(\hat\gamma\).
- Double / orthogonalized (Neyman) score — the same orthogonality principle underlies double-ML and partialling-out estimators.
References¶
- Ning & Liu (2017), A general theory of hypothesis tests and confidence regions for sparse
high-dimensional models, Ann. Statist. (
10.1214/16-AOS1448) — the decorrelated score framework and its \(N(0,1)/\chi^2\) theory. - van de Geer, Bühlmann, Ritov & Dezeure (2014), On asymptotically optimal confidence regions
and tests for high-dimensional models (
vandegeer2014) — linear-model desparsified score. - Javanmard & Montanari (2014), Confidence intervals and hypothesis testing for high-dimensional
regression (
javanmard2014confidence) — related score-correction construction. - Janková & van de Geer (2018), Semiparametric efficiency bounds for high-dimensional models
(
jankova2018semiparametric) — efficiency of the partial-information variance.