Grade Level:
8-10
Desired Number of Participants (if collaboration with other classes):
Project Timeline:
Curriculum Subject Area(s):
Math, Science, Biology, Sociology
Objectives of Project:
The student will discover how data may be viewed using different starting
points
Sampling of data will be investigated.
Materials/Resources: (List Both on-line and off-line materials/resources needed)
On-line:
Off-Line:
EXCEL
Procedure(Step by step instructions for developing the project):
Data Sampling
· There are many problems involved in ensuring that sampling
a population is representative. Sometimes data is just collected "on the
fly" and analysis is a secondary activity. Sometimes bias - either intentional
or unrealized - will creep in. Sometimes it is not clear as to what data
should be collected because the reason for data collection is not clear.
Sometimes experimental error can be a problem. And not least, the statistical
problems of sampling can interfere with the quality of the data acquired.
In any population of data there is a bare minimum size of sample that
can adequately return data that says something about the population. For
example, take the heights of 1000 people as a population. If this is sampled
once then the resultant data is hardly indicative of the parent population.
What about 10 samples? Better, yes, but will it be statistically representative?
50 samples? Better again - but representative? 500 samples would probably
be OK. So what is the most appropriate sample size, given that by the nature
of sampling we do not have infinite resources?
This spreadsheet allows us to look at the effects of sampling. It contains
a set of 100 data items. Arbitrarily these have been titled salaries, although
the data could be any meaningful numeric value. From this data the spreadsheet
easily calculates the population mean and the standard deviation. These
are fixed statistics because the population is a finite system.
By using random numbers, it is possible to "sample" this population
and so acquire a sample from the population. Sample statistics (mean and
standard deviation) can be calculated and compared with the main population.
The differences are entirely based on the sampling technique.
The spreadsheet generates a random number between 1 - 100 and uses this
to lookup the data entry from the population. It does this 10 times so
that a sample size of 10 items is collected.
It is useful to run this a number of times and record the sample's statistics and see how they vary from the original population. Press f9 to recalculate the random numbers and thus generate a new sample set.
Extensions to other subject areas:
Does the sampeling technique impact use of natural resources?
Student Evaluation Method:
Students answers to these questions:
What difference would it make if the original population had a smaller range of data? In the example, the data ranges from around 0 - 30000. What if the range was 10.001 to 40.999 ?
What effect does sample size have? In the example this is 10. Would the sample statistics be more representative with 30 or even 400 as the sample size? What is the optimum sample?
Project Evaluation