APPLIED STATISTICS #
Sampling error is the difference between a sample statistic (the mean, variance, or standard deviation of the sample) and its corresponding population parameter (the true mean, variance, or standard deviation of the population).
Mean & Variance of the sample average
The sampling distribution of the sample statistic is a probability distribution of all possible sample statistics computed from a set of equal-size samples that were randomly drawn from the same population.
Mean of the sample average E(X)=μX
The Variance of the sum of two random variables=var(x1+x2)=2σx2
POPULATION & SAMPLE VARIANCE #
The population variance is defined as the average of the squared deviation from the mean.
σ2 = ∑(xi- μ)2/N
The population standard deviation is the square root of population variance.
The sample variance, s2, is the measure of dispersion that applies when we are evaluating a sample of n observations from a population.
s2 = ∑(xi –x)2/n-1
The sample standard deviation is the square root of sample variance.
The standard error of the sample mean is the standard deviation of the distribution of the sample mean.
σx= σ/√n
Where σx= standard error of the sample mean
σ = standard deviation of the population
n= size of the sample
POPULATION & SAMPLE COVARIANCE #
The covariance between two random variables is a statistical measure of the degree to which the two variables & another.
Population covxy = [∑(xi-μx)(y-μy)]/N
Sample covxy= [∑xi-X̄)(y –ȳ )]/n-1
CONFIDENCE INTERVALS #
Confidence Intervals estimates result in a range of values within which the actual value of a parameter will lie, given the probability of 1-α.
Confidence intervals are usually constructed by adding or subtracting an appropriate value from the point estimate.
Point estimate ± (reliability factor x standard error)
Where: Point estimate = Value of a sample statistic of the population parameter
Reliability factor = Number that depends on the sampling distribution of the point estimate & the probability that the point estimate falls in the confidence interval, (1-α).
Standard error= standard error of the point estimate.
SELECTION OF THE APPROPRIATE TEST STATISTC #
Criteria for selecting the appropriate test statistic:
TEST STATISTICS | ||
When sampling from a: | Sample size (n<30) | Large sample (n≥30) |
Normal distribution with known variance | z– statistic | z– statistic |
Normal distribution with unknown variance | t-statistic | t-stastic |
Nonnormal with known variance | Not available | z– statistic |
Nonnormal with unknown variance | Not available | t-statistic |
HYPOTHESIS TESTING #
Hypothesis testing is the statistical assessment of a statement or idea regarding a population. Hypothesis testing procedures, based on sample statistics and probability theory, are used to determine whether a hypothesis is a reasonable statement and should not be rejected or if it is a unreasonable statement and should be rejected.
The null hypothesis, designated H0 ,is the hypothesis the researcher wants to reject. The alternative hypothesis, designated HA, is what is concluded if there is sufficient evidence to reject the null hypothesis.
Hypothesis testing involves two statistics: the test statistics calculated from the sample data and the critical value of the test statistic.
Test statistic = sample statistic—hypothesized value standard error of the sample statistic
Standard error of the sample statistic = σx = σ/√n
Standard error of the population statistic = sx=s/√n
ONE TAILED & TWO TAILED TESTS OF HYPOTHESIS
A one-sided test is referred to as a one-tailed test, and a two-sided test is referred to as a two-tailed test. Most hypothesis tests are constructed as two-tailed tests.
A two-tailed test for the population mean may be structured as:
H0 : μ=μ0 versus HA : μ≠μ0
The general decision rule for a two-tailed test:
Reject H0 if : test statistic > upper critical value or
test statistic < lower critical value
For a one-tailed hypothesis test of the population mean, the null & alternative hypothesis are either:
Upper tail : H0 : μ≤μ0 versus HA : μ>μ0 or
Lower tail : H0 : μ≥μ0 versus HA : μ<μ0
TYPE I & TYPE II ERRORS #
Type I : the rejection of the null hypothesis when it is actually true.
Type II : the failure to reject the null hypothesis when it is actually false.
The significance level is the probability of making a Type I error & is designated by α.
The decision for a hypothesis test is to either reject the null hypothesis or fail to reject the null hypothesis.
THE POWER OF A TEST
The power of a test is the probability of correctly rejecting the null hypothesis when it is false.
Power of a test= 1– P(Type II error)
THE RELATIONSHIP BETWEEN CONFIDENCE INTERVAL & HYPOTHESIS TEST
The expression for the confidence interval can be stated as:
-critical value ≤ test statistics ≤ +critical value
This is the range within which we fail to reject the null for a two-tailed hypothesis test at a given level of significance.
P-VALUE #
The ρ-value is the probability of obtaining a test statistic that would lead to a rejection of the null hypothesis, assuming the null hypothesis is true. It is the smallest level of significance for which the null hypothesis can be rejected. For one-tailed tests, the ρ-value is the probability that lies above the computed test statistic for upper tail tests or below the computed test statistic for lower tail tests. For two-tailed tests, the ρ-value is the probability that lies above the positive value of the computed test statistic plus the probability that lies below the negative value of the computed test statistic.
THE T-TEST #
The computed value for the test-statistic based on the t-distribution is referred to as the t-statistic. For hypothesis tests of a population mean, a t-statistic with n-1 degrees of freedom is computed as:
Where: x = sample mean, μ0=hypothesized population mean (i.e. null), s = standard deviation of the sample, n = sample size.
To conduct a t-test, the t-statistic is compared to a critical t-value at the desired level of significance with the appropriate degrees of freedom
THE Z-TEST #
The computed test statistic used with the z-test is referred to as the z-statistic. The z-statistic for a hypothesis test for a population mean is computed as follows:
Where, σ=std. deviation of the population
To test a hypothesis, the z-statistic is compared to the critical z-value corresponding to the significance of the test.
When the sample size is large & the population variance is unknown, the z-statistic is
THE CHI-SQUARED TEST #
The chi-squared test is used for hypothesis tests concerning the variance of a normally distributed population.
Hypothesis testing of the population variance requires the use of a chi-squared distributed test statistic, denoted χ2.The chi-squared distribution is asymmetrical and approaches the normal distribution in shape as the degrees of freedom increase.
The chi-squared test statistic,χ2,with n-1 degrees of freedom, is computed as:
The chi-squared test compares the test-statistic, χ2n-1, to a critical chi-squared value at a given level of significance and n-1 degrees of freedom.
THE F-TEST #
Hypothesis testing using a test-statistic that follows an F-distribution is referred to as the F-test. The F-test is used under the assumption that the population from which samples are drawn are normally distributed & that the samples are independent.
The test-statistic for the F-test is the ratio of the sample variances. The F-statistics is computed as: F= s12/s22
Where: s12=variance of the sample of n1 observations drawn from Population 1.
s22=variance of the sample of n2 observations drawn from Population 2.
n1-1 & n2-1 are the degrees of freedom used to identify the appropriate critical value from the F-table.
CHEBYSHEV’S INEQUALITY #
For any set of observations, whether sample or population data and regardless of the shape of the distribution, the percentage of the observations that lie within k standard deviation of the mean is at least 1-1/k2 for all k>1.
BACKTESTING #
Backtesting is the process of comparing losses predicted by the VaR model to those actually experienced over the sample testing period. If a model were completely accurate. We would expect VaR to be exceeded with the same frequency predicted by the confidence level used in the VaR model. In other words, the probability of observing a loss amount greater than VaR should be equal to the level of significance.