So CEF can be non-linear. Here we will explore what we do to get a linear predictor?

Linear CEF

$m(x) = E(y|\textbf{x})$ is linear in x.

$$ m(\textbf{x}) = \textbf{x'}\beta + e \implies y = \textbf{x'}\beta + e $$ Here the error doesnt have the mean error property! Because we are breaking down as linear vs nonlinear part.

Generic notation of a linear model:

$$ m(x) = \textbf{x'}\beta, x = (x_1, x_2, ... x_k)'$$

But typically x includes a constant.

So it is convenient to write as $y = \alpha + \textbf{x'}\beta + e$ But if you take expectations on both sides ad solve for $\alpha$ and substitute:

$y - \mu_y = (x - \mu_x)' \beta + e \implies \overline{y} = \overline{x'} \beta+ e$

So, you can de-mean any series and write in this format!

We know that the conditional mean is the best predictor of y. But what is the best predictor $m(x)$ in the linear family of regrressors?

Assumptions are, $Q_{xx} = E(xx')$ is positive definite. This is going to be a second moment matrix.

Best Linear Predictor

MSE is $S(\beta) = E(y - \textbf{x}'\beta)^2$
The best linear predictor of y given $\textbf{x}$ is $P(y|x) = \textbf{x}'\beta$
BLP coefficient is given by $\beta = argmin_{b \in \mathbb{R}^k} S(b)$. The BLP coefficients are also called the Linear Projection Coefficient.
$\hat{\beta}$ is unique and given by $(E(xx')^{-1} E(xy))$
The BLP is given by $P(y|x) = \textbf{x}'\beta = \textbf{x}' (E(xx')^{-1} E(xy)) $
The error $e = y - \textbf{x}'\beta$ exists and satisfies $\boxed{E(\textbf{x}e) = 0}$

This means for each i, $E[X_i e] = 0$ i.e. $X_i$ and e are orthogonal.

If x is a constant then $E(e) = 0$, however for CEF, it was always true!

For BLP, it must be true that the errors are uncorrelated with $x_is.$

$\beta$ is BLP $ <=> E(xe) =0$

This is the moment condition (in the moment estimator!). So the BLP can be written as a moments estimator!!
So, all the properties of moments estimators are true in this case!
The moment that finds BLP is that errors are othoginal to the $x_i$.

Linear Prediction with Constant Term

If y has a constant term i.e. $y = \alpha + \textbf{x'} \beta + e$

$\implies \alpha = E(y) - E(\textbf{x'} \beta) = \mu_y - \mu'_x\beta $

$\implies \beta = Var(x)^{-1} Cov(x,y)$

Linear Predictor Error Variance $\sigma^2 = E(y - \textbf{x'}\beta)^2 = Q_{yy} - Q_{yx} Q^{-1}_{xx} Q_{xy}$

This is the variance of the errors from the BLP of y on x.

Joint Normality

(y, x) are jointly normal
This means (e, x) are jointly normal
$E(e) = 0$ and $E(xe) = 0$ $\implies Cov(e, x) = 0$.
e and x are jointly normal and uncorrelated and this independent! This is only for normality!
For Independence, $E(e|x) = E(e) = 0$
So, This is CEF!
Under Joint Normal, the linear projection is a CEF!
So, therefore BLP is the best predictor among all (including nonlinear) predictors!