Select Page

STA 108
Homework 7
Due: Mar 6th 2020 (by 2:10 pm)
Please clearly print your name, course name, section number, student ID number,
and names of students who you have discussed homework problems with on top of
the first page of your homework as follows.
Name
Course STA108 Section
Student ID#
Study group members
Additional requirements can be found in the course syllabus.
Question 1: Binary covariate and interactions (50 pts)
A scientist wants to study the factors affecting length of hospital stay (y ) and identified two covariates: age of patient (x1 ) and the severity of disease (x2 , 1 if it is a
life-threatening disease and 0 otherwise). Since there might be an interaction effect
(denoted as x3 = x1 × x2 ), we consider the following regression model
y = β0 + β1 x1 + β2 x2 + β3 x3 + .
The scientist wants to decide whether the severity of disease is needed in the model
and proposes to test: H0 : β2 = β3 = 0 v.s. Ha : β2 6= 0 or β3 6= 0. In order to test this
hypothesis, the scientist fits two models. The first model considers 100 patients with
severe diseases with the R output as below.
STA 108: HW 7
1
UC Davis
The second model considers 100 patients without severe diseases with the R output as
below.
1. (20 pts) Based on the information provided, can we test the above hypothesis at
the 0.05 significance level? If so, carry out the test. If not, explain.
2. (30 pts) Describe how you will test this hypothesis if you have access to the data.
Question 2: Puzzling p-values (40 pts)
When analyzing a set of data, you run into an R output as follows.
Given the eight weeks of statistical training in statistics, you sense something strange
in this result….
STA 108: HW 7
2
UC Davis
1. (20 pts) Point out what is strange in this R output, and explain how this could
happen.
2. (20 pts) Will this phenomenon become a problem? Explain your answer.
Question 3: Testing linear combinations of parameters (10 pts)
Suppose that we have fit a linear regression, for i = 1, . . . , n,
yi = β0 + β1 xi,1 + β2 xi,2 + i ,
where 1 , 2 , . . . , n are i.i.d. errors with E[ i ] = 0 and var( i ) = σ 2 . Describe how you
will test H0 : β1 − 2β2 = 0 v.s. Ha : β1 − 2β2 6= 0.
STA 108: HW 7
3
UC Davis
STA 108 Applied Statistical Methods:
Regression Analysis
Multiple Linear Regression
Shizhe Chen, PhD
Winter 2020
1
Important Instructions
I The first four sections (motivation, model, inference, model
selection) will be taught as before
I The rest of the slides are provided as reading materials
I
I
I
I
Bring questions to lectures!
Learn to use online resources for self-learning
≤ 10 pts in the final exam
2
Outline
I Motivation
I Multiple Linear Regression Model
I Statistical Inference
I Model selection
I Least Squares Estimation
I Generalized Least Squares
3
Motivation
4
New Requests from Clients
So far, we have successfully answered the following questions.
1. Are sales positively associated with TV advertising budgets?
2. What will the amount of sales be if the TV advertising budget
is 20 (in thousands of dollars)?
However, our clients suddenly realize that there are more avenues
5
Exploratory Data Analysis
(See Section 3.1 in Code MultipleLinearRegression.html.)
6
New Questions of Interest
associated with sales?
2. What is the mean difference of sales per unit difference in TV
advertising budget when other budgets are held constant?
3. What is the expected sales given a fixed combination of
budgets?
4. How much will sales increase by increasing the TV advertising
budget by 20 (thousand of dollars)?
5. What is the maximum expected sales given a total budget of
100 (thousand of dollars)?
7
New Questions of Interest
associated with sales? (Linear regression)
2. What is the mean difference of sales per unit difference in TV
advertising budget when other budgets are held
constant?(Linear regression)
3. What is the expected sales given a fixed combination of
budgets? (Linear regression)
4. How much will sales increase by increasing the TV advertising
budget by 20 (thousand of dollars)? (Causal inference)
5. What is the maximum expected sales given a total budget of
100 (thousand of dollars)? (Optimization)
8
Multiple Linear Regression Model
9
Multiple Linear Regression Model
The multiple linear regression model takes the form
y = β0 + x1 β1 + . . . + xp βp + ,
where
I y ∈ R is the real-valued response
I xj ∈ R is the jth covariate
I β0 is the intercept term
I βj is the regression slope for the jth covariate
I ∈ R is the error term with E[ i ] = 0 and var( i ) = σ 2
Note: The term “linear” refers to the fact that the mean is a
linear function of the unknown parameters β0 , . . . , βp .
10
With n Observations
With n observations of y and x1 , . . . , xp , the complete model
becomes
y1 = β0 + x11 β1 + x12 β2 + · · · + x1p βp + 1
y2 = β0 + x21 β1 + x22 β2 + · · · + x2p βp + 2
..
.
yn = β0 + xn1 β1 + xn2 β2 + · · · + xnp βp + n ,
where the error terms are assumed to have the following properties
I E[ i ] = 0
I var( i ) = σ 2 (constant for all i)
I cov( i , j ) = 0 for j 6= i
Note: Sometimes stronger assumptions are imposed, such as ’s
are i.i.d. with mean 0 and variance σ 2 .
11
Interpretation of Multiple Linear Regression
In a multiple linear regression model
y = β0 + x1 β1 + x2 β2 + . . . + xp βp + , β1 is the expected mean
difference in y per unit difference in x1 if x2 , . . . , xp is held
constant (or adjusted/controlling for x2 , . . . , xp )
Example: Suppose that y is the systolic blood pressure of
newborns, x1 is days of age, and x2 is the weight at birth in
ounces. We say that
I β1 : We estimate that two groups of newborns with the same
age and who differ by one ounce at birth will have systolic
blood pressure that differs on average by 0.13 mm Hg (95%
CI: 0.05, 0.20).
I β2 : We estimate that two groups of newborns with the same
weight at birth and who differ by one day of age will have
systolic blood pressure that differs on average by 5.89 mm Hg
(95% CI: 4.42, 7.36).
12
Interpretation of Multiple Linear Regression
In a multiple linear regression model
y = β0 + x1 β1 + x2 β2 + . . . + xp βp + , β1 is the expected mean
difference in y per unit difference in x1 if x2 , . . . , xp is held
constant (or adjusted/controlling for x2 , . . . , xp )
Note: Interpretation of parameters might not make sense in the
multiple linear regression!
13
Interpretation of Multiple Linear Regression: Special Cases
In y = β0 + x1 β1 + x2 β2 + . . . + xp βp + , what if
I x3 = x1 × x2 ?
Effect of x1 on y will differ depends on the value of x2 .
Example: When comparing two groups of newborns that differ
by one day of age and with the same birthweight, the
difference in systolic blood pressure depends on the babies’
birthweight, with the difference in mean systolic blood
pressure decreasing by 0.13 mm/Hg for each ounce difference
in birth.
I x2 = x31 ?
x1 is constant if x2 is held constant.
Interpret all the terms that depend on x1 !
I x1 , . . . , xp are dummy variables for a categorical variable z
that has p + 1 categories?
x1 = 1 means x2 = · · · = xp = 0.
How will you interpret β1 ?
14
Multiple Linear Regression Model: Categorical Covariates
Consider the question in Homework #1, we code xi as 0 or 1 to
distinguish between ducks and pandas
I What if we found out that there are actually red pandas and
raccoons in the data set?
I xi = 0, 1, 2, 3 for ducks, pandas, red pandas, or raccoons?
I xi1 = 1 for a panda, xi2 = 1 for a red panda, xi3 = 1 for a
raccoon, and zero otherwise!
15
Multiple Linear Regression Model: Categorical Covariates
(cont.)
Create K − 1 dummy variables for a categorical variable with K
categories
Also known as the ANalysis Of VAriance, a.k.a., ANOVA
16
Multiple Linear Regression Model: Polynomial Regression1
This slide is incomplete. Take notes!
Consider a true (and unknown!) model where y is non-linear in x
y = (x − 3)4 + ⇐⇒ y = x4 − 12×3 + 54×2 − 108x + 81 +
Suppose that we have n observations of x and y, can we learn the
above model using linear regression?
1
Check out Wolfram Alpha
17
Building a Linear Model
Suppose that you are interested in studying the relationship
between y and x1 , and you have the resources to collect data (via
experiments or surveys). How will you build a linear model?
y = β0 + x1 β1 + . . . + xp βp +
Consider the following scenarios
I y is the body weight and x1 is the length of sleep per day
I y is the lung function (measured by forced expiratory volume,
or FEV) and x1 is a dummy variable for smoking
I y is the occurrence of an heart attack and x1 is a dummy
variable for depression
18
Classification of Variables
This slide is incomplete. Take notes!
I Variable of interest (or exposure, treatment, etc.)
I Response variable (or outcome)
I Confounder
I Effect modifier
I Precision variable
I Instrument
19
Confounding
I Confounding is an effect of some uncontrolled variable on the
response variable that hinders interpretation of the relationship
between the response and the predictor variable of interest
I Confounding describes real or imagined effects that distort the
relationship one wishes to observe between the predictor
variable and response
I More of a problem for observational studies
I in contrast to a designed experiment
Examples: effects of smoking on lung function (measured by forced
expiratory volume, FEV) may be confounded by age
20
Controlling for Confounding
I Implicitly with appropriate study designs
I Explicitly by measuring it and including it in the model
21
Controlling for Confounding: Design
I Match the observations that are similar in terms of
confounding variables (confounders), e.g., comparing the FEV
between smokers and non-smokers of the same age
I Relatively easy to implement
I Infeasible when there are too many confounders
I Conduct a randomized experiment, e.g., randomly assign
participants to the smoking group or the non-smoking group
I Destroy all confounding possibilities (grant causality)
I Infeasible in many cases
22
Controlling for Confounding: Model-Based
Using knowledge in this class, we can
y = β0 + x1 β1 +
0
0
0
y = β0 + x1 β1 + x2 β2 +
0
I Compare the fitted values of β1 and β1
I Eyeballing
I Hypothesis testing
You can also use other advanced statistical methods to control for
unmeasured confounders, e.g., propensity score, instrumental
variable
23
Effect Modifier
I Variable that modifies the effect (or association) of the
variable of interest on the response
I Modeling approaches
I Stratify analysis (for categorical variables)
I Multiple linear regression with an interaction term
y = β0 + x 1 β1 + x 2 β2 + x 1 x 2 β3 +
24
Precision Variable
I Variable that only affects the response variable
I Improve the precision of the model fits if included
25
Statistical Inference
26
I σij = ρij √σii σjj , where ρij is the correlation between Xi an
Xj
2 ≤ σ σ for all i, j ∈ {1, . . . , p}
I σij
ii jj
The marginals of a multivariate normal is a univariate normal
Xj ∼ N (µj , σjj )
for all j ∈ {1, . . . , p}
88
Affine Transformation of Multivariate Normal
Let x ∼ Np (µ, Σ).
Let A = {aij }n×p be a non-random matrix and consider a
non-random vector b = (b1 , . . . , bn )T .
Define y = Ax + b with A 6= 0n×p . Then
y ∼ Nn (Aµ + b, AΣAT )
Linear combinations of normal variables are normally
distributed!!!
89
Conditional Distribution of Multivariate Random Variables
If Σ is positive definite and

x
µx
Σx Σxy
∼N
,
.
y
µy
Σyx Σy
Then, the conditional distribution of x given y is
−1
x | y ∼ N (µx + Σxy Σ−1
y (y − µy ), Σx − Σxy Σy Σyx )
Important Property: x and y are independent if and only if
Σxy = 0.
90
Sampling Distribution: Our Model Assumptions
Assume that
1. The errors have mean zero: E[ ] = 0.
2. The errors are uncorrelated with common variance
var( ) = σ 2 I.
These imply that
1. E[y] = E[Xβ + ] = Xβ
2. var(y) = var(Xβ + ) = var( ) = σ 2 I
91
Mean and Variance of Our Estimates
This slide is incomplete. Take notes!
1. The least squares estimate is unbiased: E[β̂] = β
2. The covariance matrix of the least squares estimate is
var(β̂) = σ 2 (X T X)−1 .
Proof:
See Wikipedia for the proof of Gauss-Markov theorem (BLUE)
92
This slide is incomplete. Take notes!
Recall that e = y − ŷ = (I − P )y:
1. E[e] = 0
2. var(e) = σ 2 [I − P ]
3. E[eT e] = (n − p − 1)σ 2 .
4. Implication: An unbiased estimate of σ 2 is
σ̂ 2 =
eT e
n−p−1
93
Sampling Distributions
I Asymptotic Distributions
I Central Limit Theorem
I Bootstrap
I Exact Distributions
I Normality assumption: multivariate t-distribution
I Other distribution assumptions…
Construct confidence intervals/regions given the sampling
distribution as in simple linear regression
94
Properties of the LSE in Matrix Notation
Projection X
Residuals X
Sampling Distribution X
Underfitting and Overfitting
Multicollinearity
95
What Happens When you Underfit the Model?
Suppose that the true underlying model is
y = Xβ + Zη + ,
but we instead fit the model
y = Xβ + .
In fact, in real data application, we often does this because it is
impossible for us to know which variables are in the true underlying
model.
For simplicity, assume that the columns of X and Z are linearly
independent.
96
Bias due to underfitting
Naive Argument: I am only interested in the parameters β, so
why bother estimating η?
Claim: if we fit the smaller model, E[β̂] 6= β. The estimates we
get is biased! Even the fitted values are biased!
I E[β̂] = β + (X T X)−1 X T Zη
I E[ŷ] = Xβ + X(X T X)−1 X T Zη
97
Example I: Underfitting
Suppose that the true model is
yi = β0 + β1 xi + β2 x2i + i
but instead we fit the model
yi = β0 + β1 xi + i
What is the bias of β1 ?
98
Variance if we Underfit the Model?
Suppose that the true underlying model is
y = Xβ + Zη + ,
but we instead fit the model
y = Xβ + .
Claim: cov(β̂) = σ 2 (X T X)−1 . But

η T Z T (I − PX )Zη
eT e
2
= σ2 +
> σ2.
E[σ̂ ] = E
n−p−1
n−p−1
Implication: We overestimate the variance!
99
Example in R on Underfitting
Scenario I: True β = (1, 1)T
y = Xβ +
Fit the correct model
y = Xβ +
Scenario II: True β = (1, 1)T
y = Xβ + Zη +
Underfit the model
y = Xβ +
(See Section 3.5 in Code MultipleLinearRegression.html.)
100
What Happens When you Overfit the Model?
Suppose that the true underlying model is
y = X1 β1 + ,
but we instead fit the model
y = X1 β1 + X2 β2 + = Xβ,
where
X = X1 X2

and
β=

β1
β2
What will happen to our estimates β̂1 ?
101
Bias due to Overfitting
Claim: if we fit the larger model, E[β̂] = β. The estimates we get
is unbiased! Even the fitted values and σ̂ 2 are unbiased!
Proof:
102
Why don’t we keep Overfitting then?
Claim: The variance of β̂ will be larger!!! Too complicated and we
will skip the results.
Scenario True β1 = (1, 1)T
y = X 1 β1 +
Overfit the model
y = X 1 β1 + X 2 β2 +
(See Section 3.6 in Code MultipleLinearRegression.html.)
103
Summary of Effects of Underfitting and Overfitting
β̂

σˆ2
cov(β̂)
Underfitting
biased
biased
biased upward
still σ 2 (X T X)−1
Overfitting
unbiased
unbiased
unbiased
increased
Model selection: How many variables are sufficient, and what
variables should we include?
I ???
I ???
I ???
104
Properties of the LSE in Matrix Notation
Projection X
Residuals X
Sampling Distribution X
Underfitting and Overfitting X
Multicollinearity
105
Understanding Multicollinearity using Projection
Multicollinearity is a phenomenon in which one predictor variable
in a multiple regression model can be linearly predicted from the
others with a substantial degree of accuracy.
1. This means that there are strong linear dependencies among
the columns of X.
2. We refer to such X as almost singular matrix.
3. Does multicollinearity affect E[β̂]?
4. Does multicollinearity affect var(β̂)?
5. What are the implications of multicollinearity?
Solutions
I Remove excessive variables
I Shrinkage estimator
106
Generalized Least Squares
107
Linear Regression Assumptions
Model:
y = Xβ +
1. Constant variance assumption var( ) = σ 2 I.
2. Uncorrelated error.
What if these assumptions are violated? How does it affect
our solution if we fit the ordinary multiple linear regression?
108
Non-constant Variance and Correlated Error
Suppose that
y = Xβ +
1. Non-constant variance Var( ) = σ 2 V for some positive
definite matrix V.
2. Correlated error.
How should we estimate β?
109
Motivating Example: Clustered Data
(See Section 3.8 in Code MultipleLinearRegression.html.)
110
Model Generation with R
111
R fit using Ordinary Multiple Linear Regression
Model:
y = Xβ + ,
where E[ ] = 0 and var[ ] = σ 2 V with σ 2 = 1.
(See Section 3.8 in Code MultipleLinearRegression.html.)
112
Generalized Least Squares
Suppose that we have
y = Xβ + ,
where E[ ] = 0 and var[ ] = σ 2 V.
Goal: Transform the above model so that it has uncorrelated data
points, and then fit the multiple linear regression.
113
Generalized Least Squares
T
Claim: Let V = UDUT and let K = UD−1/2 U . Then K is the
inverse of the sq. root of V (singular value decomposition)
I Create y∗ = Ky.
I Create X∗ = KX.
I Create ∗ = K .
Then,
y ∗ = X∗ β + ∗ ,
where E[ ∗ ] = 0 and var[ ∗ ] = σ 2 I.
Then, we can fit the multiple linear regression we have already
learnt using the new data points y∗ and X∗ .
(See Section 3.8 in Code MultipleLinearRegression.html.)
114
Mean and Variance of Generalized Least Squares Estimator
Model:
y ∗ = X∗ β + ∗ ,
where E[ ∗ ] = 0 and Var[ ∗ ] = σ 2 I with σ 2 = 1.
GLS Estimate:
β̂GLS = (XT V−1 X)−1 XT V−1 y.
Claim:
1. E[β̂GLS ] = β.
2. var(β̂GLS ) = σ 2 (XT V−1 X)−1 .
3. Residual sum of squares: (y − Xβ)T V−1 (y − Xβ).
115
GLS versus OLS
Assume the following model:
y = Xβ +
with var( ) = σ 2 V for some positive definite matrix V and
E[ ] = 0.
What are the properties of ordinary multiple linear regression
under this model?
I E[β̂] = β
I var(β̂) = σ 2 (XT X)−1 XT VX(XT X)−1
116
Weighted Least Squares for Unequal Variances
Consider the linear regression model
yi = βxi + i ,
where var( ) = σ 2 diag(1/w1 , . . . , 1/wn ).
For unknown w1 , . . . , wn ,use iteratively reweighted least squares
117

attachment

#### Why Choose Us

• 100% non-plagiarized Papers
• Affordable Prices
• Any Paper, Urgency, and Subject
• Will complete your papers in 6 hours
• On-time Delivery
• Money-back and Privacy guarantees
• Unlimited Amendments upon request
• Satisfaction guarantee

#### How it Works

• Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
• Fill in your paper’s requirements in the "PAPER DETAILS" section.