- MBA Boost - https://www.mbaboost.com -

Statistics Review

Random variable – variable that takes on numerical values determined by the outcome of a random experiment. Usually denoted by X
Probability distribution – expresses probability that a continuous random variable takes on value X=x

To compute the probability for a normal random variable, convert X~N(μ, σ2) to Z~N(μ, σ2):

Point Prediction – If data is independent, the best prediction is the mean. If data is not independent (runs test significant at <0.05) then Xt+1 = Xt + μchange + εt

Runs Test

A random walk exists is the changes from period to period are independent and normally distributed

Estimating μ and σ {X representative of μ, s2 representative of σ2)

Linear Regression (α + ßx) + εt with εt iid N(μ, σε2)
Make distance from each point to the estimated line as small as possible. Minimize Σεt2

4 Basic Assumptions and how to check them:

Assuming X is independent (data set) implies the e’s are independent. Must check this.

Specification bias – exists when an explanatory variable that should be included in regression is left out. {if a coefficient doesn’t make sense it could be due to specification bias}

Understanding the Regression Output:

In regressions, error terms don’t accumulate (predictions)
Coefficients are random variables because they depend on sample of points
Any time you standardize using an estimate you get a t-distribution not a z-distribution



Confidence Intervals
2 components of prediction error:

  1. (α +  ßx) – (a + bx)
  2. ε
    • the ore uncertainty about the 2 sources of error, the more uncertain the prediction is and therefore a wider confidence interval
    • Minitab gives a 95% interval

Time Series with Trend
Pattern consists of:

To modify the model to account for non-constant variance, run a regression model for % change in variable (i.e. sales) {this is like a non-independent time-series point prediction}

Leading indicators can improve the model BUT are not useful for predicting

It is important to know how close b is to ß:
The sampling distribution for b is b ~ N(ß, σb2)     

Hypothesis Testing:  (H0: ß=0  HA: ß<>0) – useful for determining which variables have explanatory power
Type I Error – say ß=1 when ß=0 (reject H0 when H0 is true)
Type II Error – say ß=0 when ß=1 (accept H0 when H0 is false)

Procedure:

  1. Collect sample
  2. Compute b
  3. Compare to critical t-value
  4. Reject H0 if (b/sb) > tcritical

We’re choosing a cutoff value such that pr(Type I error) = 0.05 (α)
Each coefficient has a t-distribution with (n – total # coefficients) degrees of freedom (tn-#, α)