CEF - Conditional Expectation Function
Theorem Law of Iterated Expectations
If $E(|y|) < \infty$, then for any random variable $\textbf{x}$: $$E(E(y|\textbf{x})) = E(y)$$
General Law of Iterated Expectations
If $E(|y|) < \infty$, then for any random variable $\textbf{x}_1$, $\textbf{x}_2$: $$ E(E(y|\textbf{x}_1, \textbf{x}_2)|\textbf{x}_1) = E(y|\textbf{x}_1)$$
Conditioning Theorem
If $E(|g(\textbf{x})y|) < \infty$,
$$ E(g(\textbf{x})y|\textbf{x}) = g(\textbf{x}) E(y|\textbf{x})$$
and
$$ E(g(\textbf{x})y) = E(g(\textbf{x}) E(y|\textbf{x}))$$
CEF Error
The CEF error e is the differnce between y and CEF $m(\textbf{x})$: $e = y - m(x)$
$E(e|\textbf{x}) = E(y-m(\textbf{x}) | \textbf{x}) = E(y|\textbf{x}) - m(\textbf{x}) = 0$
Law of Iterated expectations shows more!: $E(e) = E(E(e|\textbf{x})) = E(0) = 0$
For any $h(\textbf{x})$ : $E(h(\textbf{x})e) = 0$
Any predictor $g(x)$ is the CEF if and only if: $ E(e_g| \textbf{x}) = 0$
Example: Intercept Model: Here $m(x)$ is a constant $ = E(y) = \mu$
Variance of the CEF Error
If we didnt observe x, m(x) would be a constant which is mean of y i.e. $E(y)$. But given that we observe x, it gives a lot more information to us. $m(x)$ is a function of x and we can predict how y would behave with different x.
How can we measure how much extra information x is giving? Ans: by computing variance of error. If there is less variance, then more information in x. High variance means less information in x. The variance measures the variation in y that is not explained by the conditional mean $E(y|x)$.
$Var(e) = E((e- E(e))^2) = E(e^2)$. The error variance depends on x. Because, lets say 2 models:
$y = E(y|x_1) + e_1$
$y = E(y|x_1, x_2) + e_2$
$\implies \sigma_1^2 \neq \sigma_2^2$
Theorem: $Var(y) \geq Var(y - E(y|x_1))$ (more info $\implies$ less variance)
Example
- Suppose z = (x, y) are jointly normal with zero means $\mu = (0, 0)'$ and covariance matrix
The CEF of y given x is $E(y|x) = m(x) = \rho x $
The variance of CEF error is $Var(e) = 1 - \rho^2$
Best Predictor
Best predictor $g(x)$ is one that minimize the MSE = $E(y - g(x))^2$
The CEF $m(x)$, regardless of the joint distribution of $(y, \textbf{x})$, minimizes the MSE!
The conditional variance of y given x is $$ \sigma^2(y|\textbf{x}) = \sigma^2(\textbf{x}) = Var(y|\textbf{x}) = E((y - E(y|x))^2|x)= E(e^2|\textbf{x})$$
For the above example, if the correlation is 1, the conditional variance of (y|\textbf{x}) is 0!
Conditional variance is how much info is left in y after we remove x.
The unconditional variance of the error is teh average of conditional variance. $\sigma^2 = E(e^2) = E(E(e^2|\textbf{x})) = E(\sigma^2(\textbf{x}))$
Any multivariate rv $z = (y, x)$ can be decomposed as: $$ y = m(x) + \sigma(x) \epsilon $$
Where, $m(x) = E(y|x), \epsilon = \frac{e}{\sigma(x)}, E(\epsilon|x) = 0, Var(\epsilon|x) = 0$
Often conditional variance is ignored.
$\sigma(x) = \sigma =$ const then it is homoskedasticity. If teh volatility of error is not constant then heteroskedastcity.
CEF Derivative
How does CEF vary with small change in x? The marginal effect of $x_1$ is
$$ \Delta_1 m(\textbf{x}) = \frac{\partial}{\partial x_1} m(x_1, ..., x_k)$$
note that this derivative doesnt measure change in y, but change in the conditional expectation of y.
Summary: $m(x)$ minimizes MSE. $E(e|x) = 0; E(e) = 0; y = m(x) + \sigma(x) \epsilon$
CEF can be non-linear. So, next is to understand, how to get a linear regressor?
Similarly, CEF: $E(e|\textbf{x}) = 0 <=> E(e|x_j)$
$E(e|x)$ implies $E(xe) = 0$
So CEF is more powerful than BLP. And both are examples of moment estimators.