CORRELATION AND LINEAR REGRESSION

Introduction

 

Correlation

·       Correlation measures the strength and direction of a relationship between two variables.

·       It is usually expressed with a number between -1 and 1:

o   +1 means a perfect positive relationship (as one variable increases, so does the other).

o   -1 means a perfect negative relationship (as one variable increases, the other decreases).

o   0 means no linear relationship between the variables.

·       The most common measure is the Pearson correlation coefficient.

 

Linear Regression

·       Linear regression is a statistical method used to model and analyze the relationship between a dependent variable and one (or more) independent variables.

·       It predicts the value of the dependent variable based on the value(s) of the independent variable(s).

·       In simple linear regression (one independent variable), the relationship is modeled with a straight line:

o   y = a + bx

o   where:

o   y = predicted value

o   a = intercept (value of y when x = 0)

o   b = slope (change in y for a one-unit change in x)

·       It not only shows correlation but also provides an equation for prediction.

 

Practical Example:

We have Out door(X) data and Pressure(Y) data of the LPU weather.

X = [36,30,32,36,36,37,37,30]

Y = [979.3,980.3,986.3,974.4,975.8,977.4,977.4,979.0]

 

Here

1. The correlation coefficient (r) to measure the strength and direction of the relationship between X and Y.

2. The linear regression line to predict Y based on X

 

Calculate means

X = 36+30+32+36+36+37+37+30

 = 274/8 = 34.25

Y = 979.3+980.3+986.3+974.4+975.8+977.4+977.4+979.0

 = 7829.9/8 = 978.7375

We need

(x-x)

(Y-Y)

(X-X) (Y-Y)

(X-X)2

(Y-Y)2

X

Y

X-X̄

Y-Ȳ

(X-X̄) (Y-Ȳ)

(X-X̄)2

(Y-Ȳ)2

36

979.3

1.75

0.5625

0.9844

3.0625

0.3164

30

980.3

-4.25

1.5625

-6.6406

 

18.0625

2.4414

32

986.3

-2.25

7.5625

-17.0156

5.0625

57.1680

36

974.4

1.75

-4.3375

-7.5916

 

3.0625

18.8196

36

975.8

1.75

-2.9375

-5.1406

3.0625

8.6326

37

977.4

2.25

-1.3375

-3.6781

7.5625

1.7890

37

977.4

2.25

-1.3375

-3.6781

7.5625

 

1.7890

30

979.0

-4.25

0.2625

-1.1166

18.0625

0.0689

 

Now sum

∑(X-X̄) (Y-Ȳ) = -43.8778

∑(X-X̄)2 = 65.5

∑(Y-Ȳ)2 = 91.025

 

Now

Correlation coefficient (r)

                                            r = ∑(X-X̄) (Y-Ȳ) / ∑(X-X̄)2 x∑(Y-Ȳ)2

 

 

 r = - 43.8778 / (65.5) (91.025)

 =  - 43.8778 / 5962.1375

= - 43.8778 / 77.237 = -0.568

Correlation is -0.568

 

Now

Regression line equation

                                         Y = a + b X

 

Where

b = ∑(x-x̄) (y-ȳ) / ∑ (x-x̄)2

a = Ȳ - b X̄

1st find b

                     b = -43.8778/65.5 = -0.6697

2nd find a

                     a = 978.7375 - (- 06697) (34.25)

= 978.7375+22.935

 = 1001.6725

So the regression equation is

                             Y = 1001.6725 – 0.6697 X

 

Correlation coefficient r = -0.568

Regression line.    Y =1001.67 – 0.67X


Comments

Popular Posts