You will have learnt that there are three types of average: the mean, the median, and the mode. Let’s look at them closely – why we need all three, what you can do with them, and when to use each one.
The mean is calculated by adding all the values and dividing by the number of values. Using sigma notation, where the sigma sign means “the sum of”, we can write: . The best way to envisage what the mean represents is to imagine a group of people who have been picking apples; put all the apples in a big pile, then redistribute them equally – this equal amount is the mean. It may not necessarily be a whole number. For example, if a soccer team has scored an average 2.6 goals per match over 10 games, this means that if they would have to score a consistent 2.6 goals per match (obviously an impossibility) to reach the same total of 26 goals.
The fact that the formula for the mean can be rewritten as: gives us a useful way of answering questions such as this:
A group of 10 people have a mean height of 161.4cm. An 11th person, with a height of 165cm, joins the group. What is the new mean?
We can only work out means from totals. The total height of the original group must be 161.4 × 10 = 1614cm. With the new person the total becomes 1779cm and, when divided among 11 people, the new mean is 161.7cm.
One of the problems with using the mean as a representative value for a set of data becomes apparent when the data contains some very large or very small values; these can skew the mean. For example, if a small company has 8 people on a salary of £20000, and a director with a salary of £200000, then the mean salary is £182222 – clearly not a very useful statistic.
If you put the data set in order, the median is the middle value: in other words, there are as many values below the median as there are above. For example, if two classes take the same test and you want to compare average performance, the median values would be better since an unusually good mark in one class could drag the mean up (or a very poor mark could drag the mean down).
If there are 12 values, which is the middle one? Not the 6th, surprisingly, since there are 5 values below it and 6 above. With 11 values, the 6th is the median (5 below and 5 above); when the number of values is even, the median will actually be the mean of the two middle values (6th and 7th for 12 values). With a large data set it doesn’t matter so much. For example, if you have 200 values, use the 100th value as the median.
The mode isn’t really an average since it isn’t representative of the whole data set – it’s just the value with the highest frequency; that is, the most “popular” value. A supermarket might like to know which brand of tea sold the most; a railway station manager might be interested in which hourly period had the most people passing through.
Data is often put into groups for greater clarity – we can still calculate the mean, median and mode, although the mean and median will be estimated values since we don’t now have access to all the original data values.
The mean is calculated by first multiplying the mid value of each group by its frequency to get the total value, then dividing by the number of values (which is the sum of the frequencies). For example, the table below shows the weights of apples picked from a tree, grouped in bands of 20g.
Using the mid-values (30g, 50g) etc, you could put the data into your calculator to find the mean, or draw up a table like this:
The sum of the last column is 13120 (ie the total weight of the apples is about 13120g), and the sum of the frequencies tells us that there are 200 apples. this gives the estimated mean as 65.6g, and a glance at the original table confirms that this looks about right.
For the median value, we can see that the 100th apple is somewhere in the 60 – 80 group, and we could use a method known as interpolation to estimate what it is (incidentally, you cannot use your calculator to find the median of grouped data). A better method is to draw up a cumulative frequency table, and then draw a cumulative frequency graph. I think of the cumulative frequency table in terms of “up to” values. For example, the 25 apples in the first group could weigh up to 40g; then there are a total of 83 apples weighing up to 60g.
Now we plot these values on a cumulative frequency graph. Since the median is the weight of the 100th apple, we draw a line from the cumulative frequency value of 100, over to the graph, and down to the x-axis.
The median weight is 66g; 100 apples below this weight, 100 above. In this case the mean and median are close because the distribution of values is reasonably symmetrical.
The mode is easy: since the group 60g – 80g has the greatest frequency, this is known as the modal class.