A dataset (also spelled ‘data set’) is a collection of raw statistics and information generated by a research study. Datasets produced by government agencies or non-profit organizations can usually be downloaded free of charge. However, datasets developed by for-profit companies may be available for a fee.
Most datasets can be located by identifying the agency or organization that focuses on a specific research area of interest. For example, if you are interested in learning about public opinion on social issues, Pew Research Center would be a good place to look. For data about population, the U.S. government’s Population Estimates Program from American Factfinder would be a good source.
An “open data” philosophy is becoming more common among governments and business organizations around the world, with the belief that data should be freely accessible. Open data efforts have been led by both the government and non-government organizations such as the Open Knowledge Foundation. Learn more by exploring The Open Data Handbook. There is also a growing trend in what is being called “Big Data”, where extremely large amounts of data are analyzed for new and interesting perspectives, and data visualization, which is helping to drive the availability and accessibility of datasets and statistics.
While the terms ‘data’ and ‘statistics’ are often used interchangeably, in scholarly research there is an important distinction between them.
Data are individual pieces of factual information recorded and used for the purpose of analysis. It is the raw information from which statistics are created. Statistics are the results of data analysis - its interpretation and presentation. In other words, some computation has taken place that provides some understanding of what the data means. Statistics are often presented, though they don’t have to be presented in the form of a table, chart, or graph.
Both statistics and data are frequently used in scholarly research. Statistics are often reported by government agencies - for example, unemployment statistics or educational literacy statistics. Often these types of statistics are referred to as 'statistical data'.
These Datasets are taken from real research projects but edited and cleaned for learning purposes. Each dataset is accompanied by a short, clear, narrative description of the data and easy-to-follow instructions on how to apply the research method, providing a step-by-step guide to analyzing the data, and then allowing students to practice the analysis themselves.