Skip to content

Ordinary Least Squares (OLS)

At a glance

Family: classical · Regime: low-dim (\(n>p\)) · Penalty: none · Output: point · Links: identity · Status: reviewed

Setting & assumptions

  • Gaussian family with identity link: \(\;y = X\beta^\star + \varepsilon\), \(\mathbb{E}[\varepsilon\mid X]=0\).
  • Low-dimensional: \(n \ge p\) and \(X\) has full column rank (\(X^\top X \succ 0\)).
  • Homoskedastic errors \(\operatorname{Var}(\varepsilon)=\sigma^2 I_n\) for the classical Gauss–Markov optimality statement (not needed for the estimator itself).

Estimator / objective

OLS is the unpenalized template (\(P\equiv 0\)) with the Gaussian loss:

\[ \hat\beta \;=\; \arg\min_{\beta\in\mathbb{R}^p}\; \frac{1}{2n}\,\lVert y - X\beta\rVert_2^2 . \]

Setting the gradient \(-\tfrac1n X^\top(y-X\beta)\) to zero gives the normal equations \(X^\top X\,\beta = X^\top y\), whose unique solution (full rank) is

\[ \boxed{\;\hat\beta = (X^\top X)^{-1} X^\top y\;} \]

Algorithm

Closed form; computed stably via a matrix factorization rather than inverting \(X^\top X\).

Input: X (n×p, full rank), y (n)
1. Compute thin QR:  X = Q R          # Q: n×p orthonormal, R: p×p upper-triangular
2. Solve R β = Qᵀ y  by back-substitution
Return β̂

Equivalent routes: Cholesky of \(X^\top X\), or SVD \(X=U\Sigma V^\top\) giving \(\hat\beta = V\Sigma^{-1}U^\top y\) (most robust when \(X\) is ill-conditioned).

Hyperparameters & configuration

Knob Default Notes
intercept included prepend a column of ones, or center \(X,y\) first
solver QR QR / Cholesky / SVD; SVD safest for ill-conditioning
weights \(w_i\) none if supplied → WLS: minimize \(\sum w_i (y_i-x_i^\top\beta)^2\)

No regularization parameter (\(\lambda=0\)).

Mapping to framework

  • Input: \(X\in\mathbb{R}^{n\times p}\), \(y\in\mathbb{R}^n\), link identity.
  • Output: \(\hat\beta=(X^\top X)^{-1}X^\top y\).
  • Links: identity only (for other links use GLM-IRLS).
  • Preprocessing: center/scale optional; affects intercept handling only.

Complexity

  • Time: \(O(np^2)\) to form/factor, \(O(p^3)\) for the triangular solve. Memory \(O(np)\).

Statistical guarantees

  • Unbiased: \(\mathbb{E}[\hat\beta]=\beta^\star\); covariance \(\sigma^2(X^\top X)^{-1}\).
  • Gauss–Markov: BLUE among linear unbiased estimators under homoskedasticity.
  • Under Gaussian errors, \(\hat\beta\) is the MLE and is efficient.
  • Ridge Regression — add \(\lambda\lVert\beta\rVert_2^2\) for ill-conditioned/high-dim \(X\).
  • GLM via IRLS — non-identity links / non-Gaussian families.
  • WLS / GLS — weighted / generalized least squares for heteroskedastic/correlated errors.

References

  • Nocedal & Wright, Numerical Optimization (Nocedal2006) — least squares & linear-algebra solvers.