Press "Enter" to skip to content

Descriptive Statistics: DEVIATION (4/5)

 


Deviation shows how the data deviates from the mean; the distance from the center point.

Deviation measures the difference between an observed value of an item in a dataset and some other value – usually it is the arithmetic mean (the simple average). The sign of deviation will determine its direction meaning that the positive deviation (+) exceeds the arithmetic mean and the negative deviation (-) subceeds the arithmetic mean.

The three most frequently used methods to measure deviation are:

1. VARIANCE

2. STANDARD DEVIATION

3. MEAN DEVIATION



1. Variance

What is variance? Variance measures how observations in a dataset differ from the arithmetic mean. What is the distance between items and the center of the dataset? Or, spread of data from the arithmetic mean.

How to calculate variance? Variance is the sum of the squared differences from the mean of each variable, divided by the total number of variables minus 1. It can be calculated using the following formula:

A. For a population variance:

σ2 = Σ(xi – x)2 / n

Where:

xi = Each value in the dataset             

x = Arithmetic mean of all values in the dataset

n = Number of values in the dataset

B. For a sample variance:

σ2 = Σ(xi – x)2 / (n -1)

Where:

xi = Each value in the dataset             

x = Arithmetic mean of all values in the dataset

n = Number of values in the dataset

You can follow this simple process to calculate your variances:

  1. Calculate the arithmetic mean of the dataset.
  2. Find each variable point’s difference from the arithmetic mean value.
  3. Square each of these values.
  4. Add up all of the squared values to find the sum of squares.
  5. Divide this sum of squares by n (for a population variance) or by n-1 (for a sample variance).

Example 1: The following numbers shows the distribution of employee appraisal results. There are currently 6 full-time production workers employed by the business:

Employee appraisal results:
46, 69, 32, 60, 52, 41

Step 1: Find the arithmetic mean by adding up all the results and divide by the number of results:

Arithmetic mean = (46 + 69 + 32 + 60 + 52 + 41) / 6 = 50

Step 2: To find each result’s deviation from the arithmetic mean, subtract the arithmetic mean from each result:

46 – 50 = -4

69 – 50 = 19

32 – 50 = -18

60 – 50 = 10

52 – 50 = 2

41 – 50 = -9

Step 3: To square each deviation from the mean, multiply each deviation from the mean by itself:

(-4)2 = 4 x 4 = 16

192 = 19 x19 = 361

(-18)2 = -18 x -18 = 324

102 = 10 x 10 = 100

22 = 2 x 2 = 4

(-9)2 = -9 x -9 = 81

Step 4: To find the sum of squares, add up all of the squared deviations:

Sum of squares = 16 + 361 + 324 + 100 + 4 + 81 = 886

Step 5: To find the variance for this dataset, divide the sum of squares by n – 1:

Variance (for the sample) = 886 / (6-1) = 177.2

σ2 = 177.2

This variance of 177.2 is quite large meaning that the observations are quite far from the arithmetic mean and far from each other, for instance 69 is rather far away from 32. It is recommended for management to find out the reasons for those deviations in order to keep a control on its employees’ performance.

Uses of variance in business management: Variance shows the dispersal of the items in the dataset – a rough idea how spread out the data is. If the variance is small, it means that observations are concentrated around the arithmetic mean and less spread out. If the variance is large, it means that observations are more spread out and distant from the arithmetic mean.

Advantages of variance: It is a quick mathematical method to show how individual numbers relate to the center of a dataset. The squared deviations will not sum to zero, hence will always show some sort of variability in the data.

Disadvantages of variance: It treats all deviations from the arithmetic mean as the same regardless of their direction. Also, squaring used in variance gives added weight to extreme observations (those numbers that are far from the mean) which can skew the data.



2. STANDARD DEVIATION

What is standard deviation? Standard deviation measures the average deviation of observations from the arithmetic mean value in the given dataset. Or, the average difference between the arithmetic mean and all items in the dataset. Standard deviation shows the typical deviation of an observation from the center of the dataset.

How to calculate standard deviation? Standard deviation calculates the square of deviations from the mean of the given data. Firstly, find the squared difference between each observation and the arithmetic mean of the dataset. Then, find the average of these squared differences, and take the square root.

A. For a population standard deviation:

Standard Deviation (σ) = √(Σ(xi – x)2 / n)

Where:

xi = Each value in the dataset             

x = Arithmetic mean of all values in the dataset

n = Number of values in the dataset

B. For a sample standard deviation:

Standard Deviation (σ) = √(Σ(xi – x)2 / (n -1))

Where:

xi = Each value in the dataset             

x = Arithmetic mean of all values in the dataset

n = Number of values in the dataset

Standard deviation is in fact the square root of the variance:

Standard Deviation = √σ2

Where:

σ2 = Variance of the dataset        

Uses of standard deviation in business management: While variance provides researchers with a rough idea how spread out the data is in the dataset, standard deviation is more concrete giving exact distance from the arithmetic mean. It can be used by business managers to show how market research results vary from what had been expected before the research took place. Standard deviation can also be used to check standards of output produced during quality control or to judge quality of raw materials being bought by the purchasing department.

Most datasets in the economy have a normal distribution of the variables, or so called the Bell Curve. A normal distribution plots all of the values on a symmetrical graph. The pattern that all data form is a bell-shaped curve. Half of the bell goes left from the mean and half of the bell goes right. The peak of probability density is around the arithmetic mean.

The Bell Curve – Normal Distribution and Standard Deviation.

The larger the value of standard deviation is, the wider and fatter distribution is going to be. The smaller the value of standard deviation is, the narrower and skinnier distribution is going to be.

When it comes to standard deviation and the area under the normal distribution, 68% of all observations in the dataset are spread from the arithmetic mean within one standard deviation (σ=1). 95% of all observations are spread from the arithmetic mean within two standard deviations (σ=2). 99% of all observations are spread from the arithmetic mean within three standard deviations (σ=3).

No matter how much this normal distribution will be stretched (for large standard deviation) or squeezed (for small standard deviation), the area between -1σ and +1σ is always going to be about 68%, etc.

Example 2: The variance for the dataset of the following six numbers which show the distribution of employee appraisal results is 177.2. There are currently 6 full-time production workers employed by the business:

Employee appraisal results:
46, 69, 32, 60, 52, 41

σ2 = 177.2

In order to find out the standard deviation (σ), we need to find the square root of that variance:

Standard Deviation = √σ2 = √177.2 = 13.3

σ = 13.3

The typical (average) deviation of the observation in the dataset from the arithmetic mean is 13.3. This means that the average distance of all 6 data points from the arithmetic mean is 13.3. To apply this result to the entire business, the following can be assumed:

  1. σ=1: 68% of employees in this company will score between 36.7 and 63.3 (+/- 1 standard deviation of 13.3 from the arithmetic mean of 50) on the appraisal.
  2. σ=2: 95% employees will score between 23.4 and 76.6 (+/- 2 standard deviation of 26.6 from the arithmetic mean of 50) on the appraisal.
  3. σ=3: 99% employees will score between 10.1 and 89.9 (+/- 3 standard deviation of 39.9 from the arithmetic mean of 50) on the appraisal.

Advantages of standard deviation: It is so far the best measure of deviation as it is based on all the items in the dataset to show how much observations are clustered around a mean value. Standard deviation also helps to effectively compare two datasets which have the same arithmetic mean.

Disadvantages of standard deviation: It involves squaring; therefore the result is more distorted by extreme observations. When extreme values are present in a dataset, standard deviation will be much larger than mean deviation.



3. MEAN DEVIATION

What is mean deviation? Mean deviation measures the average of deviations between observations and the arithmetic mean value of the given dataset. Or, the average of differences of all values from the arithmetic mean.

How to calculate mean deviation? Mean deviation calculates the absolute deviations from the central point of the data. Firstly, find the absolute difference between each observation and the arithmetic mean of the dataset. Then, find the average of these differences.

A. For a population mean deviation is:

Mean Deviation = Σ|xi – x| / n

Where:

xi = Each value in the dataset             

x = Arithmetic mean of all values in the dataset

n = Number of values in the dataset

B. For a sample mean deviation is:

Mean Deviation = Σ|xi – x|/ (n -1)

Where:

xi = Each value in the dataset

x = Arithmetic mean of all values in the dataset

n = Number of values in the dataset

You can follow this simple process to calculate your mean deviations:

  1. Calculate the arithmetic mean of the dataset.
  2. Find each variable point’s difference from the arithmetic mean value expressed as an absolute value.
  3. Sum up all of the absolute mean deviations.
  4. Divide this sum of absolute mean deviations by n (for a population mean deviation) or by n-1 (for a sample mean deviation).

Example 1: The following numbers shows the distribution of employee appraisal results. There are currently 6 full-time production workers employed by the business:

Employee appraisal results:
46, 69, 32, 60, 52, 41

Step 1: Find the arithmetic mean by adding up all the results and divide by the number of results:

Arithmetic mean = (46 + 69 + 32 + 60 + 52 + 41) / 6 = 50

Step 2: To find each result’s deviation from the arithmetic mean, subtract the arithmetic mean from each result and express as an absolute value:

46 – 50 = |-4| = 4

69 – 50 = |19| =19

32 – 50 = |-18| = 18

60 – 50 = |10| = 10

52 – 50 = |2| = 2

41 – 50 = |-9| = 9

Step 3: To find the sum of absolute mean deviations, add up all of the absolute deviations:

Sum of absolute deviations = 4 + 19 + 18 + 10 + 2 + 9 = 62

Step 4: To find the mean deviation for this dataset, divide the sum of all absolute mean deviations by n – 1:

Mean deviation (for the sample) = 62 / (6-1) = 12.4

This mean deviation of 12.4 is lower than the standard deviation of 13.3 because the result is less impacted by the extreme values of 32 and 69 as squaring was not conducted in mean deviation.

Uses of mean deviation in business management: While mean deviation is very similar to standard deviation in giving concrete distance between observations and the arithmetic mean, mean deviation is more suitable for datasets which include extreme values. Sometimes figures will deviate from the mean a lot because the market research survey was conducted in extremely poor regions, the machine broke down causing production process to stop for the whole day or there was a random fluctuation caused by a natural disaster.

Advantages of mean deviation: It does not involve squaring; therefore the result is less distorted by extreme observations.

Disadvantages of mean deviation: It is limited because of canceling out positive and negative deviations whereas plus and minus signs are ignored. This poses a problem for mean deviation results to be considered applicable to business decision making.

This article showed in details how numerical data might be summarized using the statistical techniques for calculating deviation. While variance shows the spread of data from the arithmetic mean, both standard deviation and mean deviation attempt to show the deviation of observations from the mean in a given dataset. However, standard deviation gives the number that represents the average (typical) difference between observation and the arithmetic mean, and mean deviation gives the number that represents the average of differences between observations and the arithmetic mean.

You can find out more about statistical analysis of market research results here.