- MBA Boost - https://www.mbaboost.com -

Statistics Outline

Important definitions

Mean of a probability distribution – Measure of the center of a distribution

Standard deviation of a probability distribution – Measure of the spread, or dispersion, of a distribution

Adding a constant to a random variable – Shifts the distribution of the random variable

Multiplying a random variable by a constant – Changes the dispersion of the distribution

Normal distribution – The normal distribution is completely characterized by two parameters

Area under the curve represents probability

Computing probability for Z ~ N(0,1) – Use tables

Converting X ~ N(μ, σ2) to Z ~ N(0,1) – Use

If X ~ N(μ, σ2), compute the probability that X is within one standard deviation (σ) of μ, i.e., compute pr(μσ < X < μ + σ)

Formal statistical model for IBM data

Check assumptions of the model

Runs test

Model: Changet iid N(-1.4, (33.39)2) where Changet is the growth rate from month t-1 to month t

Estimating the population mean (μ) of a normal distribution

Measuring the quality of an estimator

Sampling distributions

Usefulness of the sampling distribution of  in determining quality of  as an estimator for μ

Sampling distribution for

Estimating σ2 in the MBA salary example

Estimating the regression line α + ß Adv

Finding the line that is as close as possible to the data points

Statistical notation for the regression model:  Salest = α + ßAdvt + εt,  εt iid N(0, σε2)
Intuitive interpretation of Salest = α + ßAdvt + εt 

Interpreting and estimating σ2 in the model IBMt = α + ß NYSEt + εt,  εt iid N(0, σε2)

Multiple regression

Checking the assumptions of the regression model

Diagnostic test for the linearity assumption

Diagnostic test for the constant variance assumption

Diagnostic test for the independence assumption

Diagnostic test for the normality assumption

Measure of the explanatory power of the regression

Specification bias is an important problem to be aware of

Regression model modification if the linearity assumption is violated

Sex discrimination lawsuit example to illustrate dummy variables

Outliers

Bond example: Is the rating of a bond (A, AA, or AAA) related to its yield?

Point prediction for Sales = α + ß Advp + ε for a specified value of Advp

Confidence intervals

Multiple dummy variables – Alternative approach to understanding multiple dummy variables

Time series models with trend, seasonal and cyclical components

Use TIME and multiple dummy variables to model trend and seasonality in sales

Modeling the short term correlation left after accounting for trend and seasonal components

Prediction of SALES in Quarter 1, 1983 (period 29)

Consider the concept of point prediction, and the resulting confidence interval, in a simpler regression model Sales = α + ß Advp + ε for a specified value of Advp

Measuring the quality of the estimate of ß

Measuring the quality of the estimate of ß – Use an analogy with measuring the quality of  as an estimator of μ

IBM/NYSE example – See Section 4 of the Class Notes

Sex discrimination example – See Section 7 of the Class Notes

Estimated regression line, and therefore the slope of the estimated line b, depends on the sample of points, which implies b is a random variable

 
Hypothesis testing in a regression context

The intuitive decision rule is: Say ß = 1 if b (the estimate of ß) is much greater than 0

Two types of errors

Computing critical values, i.e., defining “large”

In practice, we collect a sample, compute b, and then compare this value to the critical value. If it is greater than the critical value, then we reject H0

Hypothesis testing in a regression context

Simple example of hypothesis testing in a regression context

The intuitive decision rule is: Say ß = 1 if b (the estimate of ß) is large

Computing critical values, i.e., defining “large”

Five steps of hypothesis testing

  1. State and interpret hypotheses
  2. Give the intuitive decision rule (IDR)
  3. Obtain the distribution of the test statistic and then compute the cutoff value
  4. State the decision rule
  5. Collect the data and make a decision

Test H0: ß1 = 0 against HA: ß1 < 0 in the sex discrimination regression:
           Salaryi = α + ß1  SEXi + ß2 Seniorityi + εi

Two-sided hypothesis tests in the IBM regression: IBMt = α + ß NYSEt + εt
Test H0: ß = 0 against HA: ß ¹ 0

Use hypothesis testing to determine which variables belong in the regression:
      y = α + ß1x1 + ß2x2+ ß3x3 + … + ß10x10 + ε

Test H0: ß1 = 0 against HA: ß1 < 0 in the sex discrimination regression:
      Salaryi = α + ß1 SEXi + ß2 Seniorityi + εi

Two-sided hypothesis tests in the IBM regression: IBMt = α + ß NYSEt + εt

Use hypothesis testing to determine which variables belong in the regression:
      y = α + ß1x1 + ß2x2+ ß3x3 + … + ß10x10 + ε

Multicollinearity in the model: yi = α + ß1x1i + ß2x2i + εi