Introduction

This chapter gives you an opportunity to build on what you have learned in previous Grades about data handling and probility. The work done will be mostly of a practical nature. Through problem solving and activities, you will end up mastering further methods of collecting, organising, displaying and analysing data. You will also learn how to interpret data, and not always to accept the data at face value, because data are sometimes unscrupulously misused and abused in order to try to prove or support a viewpoint. Measures of central tendency (mean, median and mode) and dispersion (range, percentiles, quartiles, interquartile, semi-inter-quartile range, variance and standard deviation) will be investigated. Of course, the activities involving probability will be familiar to most of you — for example, you have played dice games or card games even before you came to school. Your basic understanding of probability and chance gained so far will be deepened to enable you to come to a better understanding of how chance and uncertainty can be measured and understood.

**Standard Deviation** (*σ*)

Standard deviation is the square root of the arithmetic mean of the squares of deviations of the terms from their arithmetic-mean and it is denoted by *σ*.

The square of standard deviation is called the **variance** and it is denoted by the symbol *σ*^{2}.

(i) For simple (discrete) distribution

(ii) For frequency distribution

(iii) For classified data

Here,

*x*is class mark of the interval.

**Variance and standard deviation**

Sometimes the mean is a more useful measure of central tendency than the median.

The measures of dispersion (spread) around the mean are called the **variance** and the **standard deviation.**

**Standard deviation**

The standard deviation is the square root of (the sum of the squared differences between each score and the mean divided by the number of scores). The formula for standard deviation is:

where

*x*is each individual value,

*x̄*is the mean and

*n*is the number of values. The symbol sigma Σ means ‘the sum of’.

__This formula will be on the data sheet Make sure you can use the formula properly__.

Calculating the standard deviation using the formula:

(i). Find the mean of all the numbers in the data set.

(ii). Find each value of

*x-x̄*. In other words, work out by how much each of these values differs (or deviates) from the mean.

(iii). Square each deviation. Find each value of (

*x-x̄*)

^{2}

(iv). Add all the answers together. In other words, find Σ(

*x-x̄*)

^{2}.

(v). Divide this sum by the number of values,

*n*.

(vi). You have now found

. This value is called the variance.

(vii). Find the

**square root of the variance**to find the

**standard deviation**.

By working through these steps, you have found the standard deviation using the formula.

Example 1: **Finding the variance and standard deviation**

These are the results of a mathematics test for a Grade 11 class of 20 students.

100 69 62 72 73 55 32 83 78 80

a) Calculate the mean mark for the class.

b) Complete the table below and use it to calculate the standard deviation of the marks.

c) What percentage of the students scored within one standard deviation of the mean?

Solution:

a)

*x̄*=(52+44+62+66+60+57+95+78+71+62+100+69+62+72+73+55+32+83+78+80)÷20=67.55

b)

__note__: The squaring of (

*x-x̄*) deals with the effect of the negative signs.

At the end, we find the square root of the whole answer to ‘reverse’ the effect of the square.

(correct to 2 decimal places)

c) One standard deviation from the mean lies between

*x̄-σ*;

*x̄+σ*)=(67.55-15.75; 67.55+15.75)

=(51.8; 83.3)

16 scores lie in the interval (51.8; 83.3)

16 out of 20 of the marks lie within one standard deviation of the mean, (16/20)×100=80%.

__Answer__: 80% of the students’ marks lie within one standard deviation from the mean.

__note__: We can say this is a representative set of data, because more than 66.6% lie within one standard deviation of the mean.

__note__:

● the **interquartile range** measures a spread around the **median**, so it has to do with the positions of data and not their actual values.

● the **standard deviation** measures a spread around the **mean**, using the actual values of the data and not just their positions.

Example 2:

To calculate the standard deviation of 57; 53; 58; 65; 48; 50; 66; 51, you could set it out in the following way:

__Note__: To get the deviations, subtract each number from the mean.

__Note__: The sum of the deviations of scores about their mean is zero. This always happens; that is Σ(

*x-x̄*)=0, for any set of data. Why is this? Find out.

Calculate the variance (add the squared results together and divide this total by the number of items).

Cape Town | 3.96 | 3.76 | 4.00 | 3.91 | 3.69 | 3.72 |
---|---|---|---|---|---|---|

Durban | 3.97 | 3.81 | 3.52 | 4.08 | 3.88 | 3.68 |

a) Find the mean price in each city and then state which city has the lower mean.

b) Find the standard deviation of each city’s prices.

c) Which city has the more consistently priced petrol? Give reasons for your answer.

Solution:

a) Cape Town: 3.84. Durban: 3.82. Durban has the lower mean.

b) Standard deviation:

c) The standard deviation of Cape Town’s prices is lower than that of Durban’s. That means that Cape Town has more consistent (less variable) prices than Durban.

Ex4. The data below shows the energy levels, in kilocalories per 100 g, of 10 different snack foods.

a) Calculate the mean energy level of these snack foods.

b) Calculate the standard deviation.

c) The energy levels, in kilocalories per 100g, of 10 different breakfast cereals had a mean of 545,7 kilocalories and a standard deviation of 28 kilocalories. Which of the two types of food show greater variation in energy levels?

What do you conclude?

Solution:

a) Mean =5500/10=550 kilocalories

b) Compute this by yourself! σ≅69kilocalories.

c) Snack foods have a greater variation. The standard deviation for snack foods is 69 kilocalories whilst the standard deviation for breakfast cereals is 28 kilocalories. i.e energy levels of breakfast cereals is spread closer to the mean than in those of the snack food.

Ex5. The times for 8 athletes who ran a 100 m sprint on the same track are shown below. All times are in seconds.

a) Calculate the mean time.

b) Calculate the standard deviation for the data.

c) How many of the athletes’ times are more than one standard deviation away from the mean?

Solution:

a)

*x̄*=10.4 b)

*σ*=0.27

c) The mean is 10.4 and the standard deviation is 0.27. Therefore the inter- val containing all values that are one standard deviation from the mean is [10.4-0.27;10.4+0.27]=[10.13;10.67]. We are asked how many values are further than one standard deviation from the mean, meaning outside the interval. There are 3 values from the data set outside the interval.

Ex6. The following data set has a mean of 14.7 and a variance of 10.01.

*a*; 16; 11; 19; 14;

*b*; 13

Compute the values of

*a*and

*b*.

Solution:

From the formula of the mean we have

From the formula of the variance we have

Substitute a=33-b into this equation to get

Therefore

*b*=13 or

*b*=20.

Since

*a*=33-

*b*we have

*a*=20 or

*a*=13. So, the two unknown values in the data set are 13 and 20.

We do not know which of these is

*a*and which is

*b*since the mean and variance tell us nothing about the order of the data.