Homoskedasticity refers to the condition that the variance of the error term is constant for all independent variables, X, from i = 1 to n: Var(εi| Xi) = σ2.
Heteroskedasticity means that the dispersion of the error terms varies over the sample. It may take the form of conditional heteroskedasticity, which says that the variance is a function of the independent variables. Which creates significant problem for statistical inference.
Effects of Heteroskedasticity on regression
The standard errors are usually unreliable estimates.
The coefficient estimates are consistent and unbiased
Because of unreliable standard errors, hypothesis testing is unreliable.
Scatterplot of residuals versus one of the independent variable can reveal patterns among the observations. One more method is Hypothesis testing using Chi squared test.
If conditional Heteroskedasticity is detected, white standard error can be used in hypothesis testing instead of the standard errors from OLS estimation procedures.
Multicollinearity refers to the condition when two or more of the independent variables, or linear combinations of the independent variables, in a multiple regression are highly correlated with each other. This condition distorts the standard error of the regression and the coefficient standard errors, leading to problems when conducting t-tests for statistical significance of parameters.
If one of the independent variables is a perfect linear combination of the other independent variables, then the model is said to exhibit perfect multicollinearity.
Imperfect multicollinearity arises when two or more independent variables are highly correlated, but less than perfectly correlated.
EFFECT OF MULTICOLLINEARITY ON REGRESSION ANALYSIS.
As a result of multicollinearity, there is a greater probability that we will incorrectly conclude that a variable is not statistically significant (e.g., a Type II error).
The most common way to detect multicollinearity is the situation where t-tests indicate that none of the individual coefficients is significantly different than zero, while the R2 is high.
High correlation among the independent variables suggests the possibility of multicollinearity, but low correlation among the independent variables does not necessarily indicate multicollinearity is not present.
The most common method to correct for multicollinearity is to omit one or more of the correlated independent variables.
There are statistical procedures that may help in this effort, like stepwise regression, which systematically remove variables from the regression until multicollinearity is minimized.
Omitted variable bias is present when two conditions are
(1) the omitted variable is correlated with the movement of the independent variable in the model, and
(2) the omitted variable is a determinant of the dependent variable.
Omitting a relevant independent variable in a multiple regression results in regression coefficients that are biased and inconsistent, which means we would not have any confidence in our hypothesis tests of the coefficients or in the predictions of the model
Model with too many variables performs poorly in out of sample data due to overfitting problem. Overfit models models have high bias error. Smaller models have high in sample variance errors (Lower R2). There are two ways to deal wit this bias variance tradeoff
Assumption of no outlier is violated in case of outliers in model. One metric to find out outlier is Cooks measure.
The Gauss-Markov theorem says that if the linear regression model assumptions are true and the regression errors display homoskedasticity, then the OLS estimators have the following properties.
Powered by BetterDocs
Create a new account
Number of items in cart: 0
Enter the destination URL
Or link to existing content