Project Title:

Numbers Don’t Lie  or do they?

Brief Description of Project:
Students almost always jump to conclusions based on limited information.  This spreadsheet invites them to question the information they see in the polls in their newspapers and on TV.  A spreadsheet is used to discuss sampling.  It may be used as the basis for a full project.

Grade Level:
8-10

Desired Number of Participants (if collaboration with other classes):

Project Timeline:

Curriculum Subject Area(s):
Math, Science, Biology, Sociology

Objectives of Project:
The student will discover how data may be viewed using different starting points
Sampling of data will be investigated.

Materials/Resources: (List Both on-line and off-line materials/resources needed)

On-line:

Off-Line:
EXCEL

Procedure(Step by step instructions for developing the project):
 

Data Sampling
· There are many problems involved in ensuring that sampling a population is representative. Sometimes data is just collected "on the fly" and analysis is a secondary activity. Sometimes bias - either intentional or unrealized - will creep in. Sometimes it is not clear as to what data should be collected because the reason for data collection is not clear. Sometimes experimental error can be a problem. And not least, the statistical problems of sampling can interfere with the quality of the data acquired.

In any population of data there is a bare minimum size of sample that can adequately return data that says something about the population. For example, take the heights of 1000 people as a population. If this is sampled once then the resultant data is hardly indicative of the parent population. What about 10 samples? Better, yes, but will it be statistically representative? 50 samples? Better again - but representative? 500 samples would probably be OK. So what is the most appropriate sample size, given that by the nature of sampling we do not have infinite resources?
This spreadsheet allows us to look at the effects of sampling. It contains a set of 100 data items. Arbitrarily these have been titled salaries, although the data could be any meaningful numeric value. From this data the spreadsheet easily calculates the population mean and the standard deviation. These are fixed statistics because the population is a finite system.
 

By using random numbers, it is possible to "sample" this population and so acquire a sample from the population. Sample statistics (mean and standard deviation) can be calculated and compared with the main population. The differences are entirely based on the sampling technique.
 

The spreadsheet generates a random number between 1 - 100 and uses this to lookup the data entry from the population. It does this 10 times so that a sample size of 10 items is collected.
 

It is useful to run this a number of times and record the sample's statistics and see how they vary from the original population. Press f9 to recalculate the random numbers and thus generate a new sample set.

Extensions to other subject areas:
Does the sampeling technique impact use of natural resources?

Student Evaluation Method:
Students answers to these questions:

What difference would it make if the original population had a smaller range of data? In the example, the data ranges from around 0 - 30000. What if the range was 10.001 to 40.999 ?

What effect does sample size have? In the example this is 10. Would the sample statistics be more representative with 30 or even 400 as the sample size? What is the optimum sample?

Project Evaluation