Covariance and Correlation

In probability and statistics, to know the relationship between any two random variables, covariance and correlation play a vital role. In this article we will learn about these two major concepts.

What is covariance

Covariance tells us the relationship between two variables. It varies from +infinity to -infinity.

Positive covariance means when variable increase another variable also increase and vice versa for negative covariance. Zero covariance means there is no relationship between two variables.

However, covariance fails to quantify the strength of the relation between two variables.

Mathematical representation of covariance

The covariance between two variables x and y is given by

(for population):

For a sample covariance, the formula is slightly adjusted:

Where:

  • Xi — the values of the X-variable
  • Yj — the values of the Y-variable
  • — the mean (average) of the X-variable
  • Ȳ — the mean (average) of the Y-variable
  • n — the number of the data points

Interpretation of covariance

Let us understand the covariance by an example.

Suppose we have a data set of two variables as x and y and we want to know the relationship between these two variables.

By calculating covariance we get Cov (x, y) = -6.34 and Cov(A, B) = -6333.34

Which can be concluded as,

  • Covariance is negative so there is negative relation between X and Y variable means when X increase, Y will decrease.
  • When we multiply each value by 100, we get new set of data i.e. A and B for which the covariance is -6333.34. Here covariance varies but the we know that the relationship between the two variables is same. Which gives us that covariance only direction in nature.
  • It shows that even the relationship between the X, Y and A, B but just by changing the magnitude the covariance changes.

To overcome this problem (point 3 and 4) Correlation come into the existence.

Correlation:

Like covariance, correlation also give us the relationship between two variables. But apart form direction it also tell us the strength of relationship of two variables. Means it quantify the relationship of variables between -1 to +1 along with direction.

But-

  • Correlation define the direction as well as the strength between the two variables.
  • Its coefficient varies from -1 to +1.
  • Zero correlation means there is no relationship between two variables.

Correlation coefficient that indicates the strength of the relationship between two variables can be found using the following formula:

Where:

  • rxy — the correlation coefficient of the linear relationship between the variables x and y
  • xi — the values of the x-variable in a sample
  • — the mean of the values of the x-variable
  • yi — the values of the y-variable in a sample
  • ȳ — the mean of the values of the y-variable

Also,

correlation (x, y) = Covariance (x, y)/(Standard deviation of x * standard deviation of y)

Interpretation of correlation

Let us suppose, the correlation between C and D is 0.6 and F and G is 0.8(for example)

So,

  • This shows that relationship between F and G is stronger than C and D.
  • If we increase C by 10%, then there is 6% chance of that D will increase.
  • It does not mean that if we increase C by 10, then there would be exactly 6% increment is D.

Data Analyst at IBM

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store