Graphical Representation Of Data In Statistics Assignment 1

This guide will help you conduct statistical research for your first CMNS 261 assignment. Review your assignment for directions as to what to do. Use the following sources to find relevant materials for your paper.

If you need additional assistance, Ask a Librarian lists all the different ways that you can consult a research librarian.

Contact info


If you need help, please contact Sylvia Roberts, Liaison Librarian for Communication & Contemporary Arts at SFU Vancouver: 778.782.5043 (main) SFU Burnaby: 778.782.3681 or sroberts@sfu.ca or Ask a librarian.

Introduction

"There are 3 kinds of lies: Lies, damned lies and statistics." (Attributed to both Benjamin Disraeli and Mark Twain)

Statistics are often used to support arguments in both popular and scholarly publications. Critical assessment of the source of statistics, including the analytical method used, the data gathering techniques, and any qualifying factors, will help you assess the validity of the statistical claims made by the authors. It will also help you to use statistical information effectively in your own research papers.

Statistics are collected by many different agencies and organizations on a variety of topics. After collection, the data is processed and packaged for publication. Part of the processing involves grouping the data, e.g. by age or sex or geography.

Statistics for the non-statistician

Think critically about the statistics that you obtain. For example:

  • Where did the statistic come from?
  • Who conducted the research? And for what purpose?
  • Is the methodology sound?
  • Does the analytical method and measurements used seem appropriate to the question being addressed?
  • How are the variables defined?
  • How current is the data?
  • Is the scope appropriate for your question?
  • Is this the complete data or a selected set?
  • Are the graphical representations of data accurate?

If you want more detail about critically assessing statistics used in publications, try reading some of the following materials:

What's the difference between primary and secondary sources of statistical information?

Generally, a primary source is a document that contains an account of an event or research written by a participant or witness of the event or by the person conducting the research. 

Secondary sources are those publications that describe, summarize, interpret and analyze primary sources. Secondary sources are one step removed from the event being described. 

Using a secondary source, such as a newspaper article, can help you identify the relevant primary sources that were used to inform the article. The secondary source can also provide contextual and background information that helps you understand the information in a primary source. 

Secondary sources can be a handy starting place because the writer has gathered and interpreted the statistics for the reader. Often journalists will include statistics in their stories to support claims or to illustrate the extent of an issue.

NOTE: The meaning of "primary source" differs depending on the academic discipline and can change according to the context.

For example, when researching an event, a historian would consider a newspaper articles providing an account of an event a primary source if written at the same time as the event occurred. Historians see writings (books or articles) that analyze the event after the fact as secondary sources. In scientific literature, primary sources are articles reporting on research results and newspaper articles discussing that research would be secondary sources.

If you were doing research on postsecondary education, newspaper articles citing government decisions about funding post-secondary education would be secondary sources. The primary documents would be the report written by the government body that made the decision. However, if you were doing research on media portrayals of postsecondary education, the same newspaper articles could be evidence for that research and considered a primary source in that context.

Starting with secondary sources

Check out these news databases to find articles that provide statistics on Canadian topics:

Statistics Canada also publishes secondary sources to provide a statistical overview of Canadian life and announce new statistical releases:

  • The Daily - Statistics Canada's official release bulletin, The Daily issues news releases on current social and economic conditions and announces new products.
  • Canada at a Glance -Presents current statistics on Canadian demography, education, health and ageing, justice, housing, income, labour market, household, economy, travel, finance, agriculture, foreign trade and the environment.
  • Canada Year Book (also in print) - Published annually and gives a statistical overview of Canadian life. You can browse the various chapters or search for the occurrence of specific terms. This source provides discussion, graphs, tables and media files.

RESEARCH STRATEGY: Use newspaper stories and The Daily to identify relevant statistical sources

Journalists get story ideas from The Daily, often through subscribing to their news service. News stories will cite Statistics Canada publications relating to a topic but usually won't include the title or other citation details.

If I wanted to find all news stories that cite Statistics Canada sources, I would use this as a search term, combined with search terms that describe my topic. A sample search might be: 

Using this search in Canadian Newsstream, I found a good set of results, including this story:

The highlighted text shows Statistics Canada as the source of the cited statistics, the topic covered and the date the story was published. These can be used as leads to track the original news release in The Daily

Assuming that the journalist would publish a story within a few days of the news release, we would look at issues of The Daily, starting with the date of the news story and going back in time. In this case, a few days prior to the story being published, we find an announcement for a new issue of Health Reports, which seems like a potential source for data on life expectancy.

The highlighted text shows Statistics Canada as the source of the cited statistics, the topic covered and the date the story was published. These can be used as leads to track the original news release in The Daily

Assuming that the journalist would publish a story within a few days of the news release, we would look at issues of The Daily, starting with the date of the news story and going back in time. In this case, a few days prior to the story being published, we find an announcement for a new issue of Health Reports, which seems like a potential source for data on life expectancy.

When we follow the link, we find this is indeed the source of statistics cited in the newspaper article.

Notice that there is a direct link at the bottom of The Daily article to the Health Reports record, where you'll be able to find the full text of the relevant article.

RESEARCH STRATEGY: Use primary data sources wherever possible

When using statistics from news sources, consider going to the primary source of data to verify that the interpretation of the statistical data isn't affected by journalistic bias and that the use of the statistics is valid and consistent with the original source.

Statistics Canada's sources of primary statistical information

Statistics Canada (AKA Stats Can) is the major statistical gathering agency of the federal government. It publishes the "whole range of statistics on the economic and social activities of the Canadian people". There are daily, weekly, quarterly, annual and irregular publications. Stats Can also publishes the Census of Canada at five-year intervals.

The information is available in a variety of formats including print, microform and online. SFU Library is a "depository library" and automatically receives one copy of Stats Can paper format publications. However, Statistics Canada makes their reports and data free via their website. SFU also subscribes to products that enable researchers to use Statistics Canada data.

RESEARCH STRATEGY: Browse Statistics Canada publication by topic

The Statistics Canada website enables researchers to browse by topic or by the name of the Statistics Canada resource type. You can also search the publication records by words that represent your topic, to find a list of matching records.

Browsing by topic enables you to drill down to find the range of publications available for each so is an ideal way to generate ideas about what might be a suitable topic choice.

To do this, choose the Browse by Subject link in the navigation bar at the top of the Statistics Canada home page. Based on your broad topic selection, choose a subject from the list. 

Statistics Canada

Statistics Canada (AKA Stats Can) is the major statistical gathering agency of the federal government. It publishes the "whole range of statistics on the economic and social activities of the Canadian people". There are daily, weekly, quarterly, annual and irregular publications. Stats Can also publishes the Census of Canada at five year intervals.

The information is available in a variety of formats including print, microform and online. SFU Library is a "depository library" and automatically receives one copy of Stats Can paper format publications.  However, Statistics Canada makes their reports and data free via their web site.  SFU also subscribes to products that enable researchers to use Statistics Canada data.

RESEARCH STRATEGY:  Browse Statistics Canada publication by topic

The Statistics Canada web site enables researchers to browse by topic or by the name of the Statistics Canada resource type. You can also search the publication records by words that represent your topic, to find a list of matching records.  
Browsing by topic enables you to drill down to find the range of publications available for each so is an ideal way to generate ideas about what might be a suitable topic choice.
To do this, choose the Browse by Subject link in the navigation bar at the top of the Statistics Canada home page.  Based on your broad topic selection, choose a subject from the list. 
Some broad topics be covered in multiple categories so be prepared to explore more than one. For example, if I wanted to continue to search for statistics relating to life expectancy of native Canadians, I could choose either Aboriginal Peoples or Health as my starting place. 
Wherever I begin my browsing, Statistics Canada provides useful definitions, as well as subtopics and links to specific statistical sources. I will browse through the subtopics to find ways that this subject has been explored.
Within each of the subtopics, Statistics Canada links to key publications and data products, reporting on how many can be found in each category. You may want to browse through these, considering the types of questions each may address.
Notice the link to Definitions, data source and methods. Statistics Canada provides detailed information about their methodology, including the nature of each survey, the scope of the reports, definitions of terminology, etc. If you don't understand something in a table or a report, this is a good place to find the answer to your questions.a
RESEARCH STRATEGY: Search for Statistics Canada publications
Once you have a sense of how you might like to narrow your research question, you may want to search for documents that answer specific aspects of your question, either one at a time or in combination.

For example, my browsing session might point me towards exploring causes of death among Canada's aboriginal population in comparison with the general Canadian population.

Specifically, I might want to look at rates of smoking and lung cancer in each population.

Searching enables me to find documents that include more than one search concept.
NOTE: Statistics Canada uses specific terminology to identify the topics of their reports. These may not be words that you would initially think of searching. 

For example, in the search above, the terms "aboriginal peoples", "ethnic origin", "tobacco use", "second-hand smoke" are suggested as additional search terms.

If you find relevant records, check out the terminology used to describe the topic of the report and use these terms when searching, to improve your results.

Your list of results will include all types of Statistics Canada publications. Follow the links to find articles in The Daily, analytical reports, data tables, census information, etc. Read the publication record to find out if this is a regularly issued publication so you can check for more recent data) or if there are other related publications by Statistics Canada. 
You may want to also search within individual Statistics Canada products, such as the Census and CANSIM.
>>CANSIM (also available via the CHASS CANSIM subscription)
CANSIM contains searchable time series data, allowing you to see how statistics have changed over time. You can browse or search CANSIM II directly. You may also find references to CANSIM tables in Stats Can publications, catalogue results or by searching CANSIM on the Stats Can web site. 

You can use the same table or series number to access it through the CANSIM II database, to which SFU subscribes. Alternatively, you can browse or search CANSIM II directly
>>Census of Canada (also available via the CHASS Canadian Census Analyzer )
The Census of Canada is conducted every five years, to provide a socioeconomic picture of the country's population, at the country, province, municipal and other census geographic divisions.
A list of Census program topics show what type of data is gathered, enabling you to do geographic comparisons across Canada.

Other government sources of statistical information information

Other government departments gather statistics related to their jurisdictional concerns. For example, Health Canada and its agencies collect data that related to the Canadian medical system and consumers' use of it. I found the following two sources by searching the Library Catalogue for "":

  • Health expenditures in Canada by age and sex, 1980-81 to 2000-01: statistical annexe published by Health Policy and Communications Branch, Health Canada
  • Health indicators 2013 / Statistics Canada, Canadian Institute for Health Information.

Use the Canadian Research Index to find government reports on your topic. If you want chiefly statistics, use "statistics" as a keyword in your search. However, many reports will contain statistical information even if the record doesn't explicitly say so.

If you locate a report that interests you, you can search for it in the Library catalogue.

You can also search web sites of specific government departments that have jurisdiction for your topic area. If you've identified the relevant government department by searching secondary sources, you can go directly to their web site. Use the site search function and/or look for a list of publications or reports.

The government of British Columbia also has a statistical agency, BC STATS, which collects and publishes reports of statistics from Statistics Canada, provincial government ministries and agencies and from administrative files. This is a good place to start if you're only interested in statistics about BC. Note that some of the fee-based publications issued by BC STATS are available online for SFU researchers.

Citing it right!

You're asked to use APA style to cite documents that you use in your report. The following guides will help you identify the key information you need to use in your citation:

NOTE:For this assignment, stick to published sources of statistical data that you can access for free through the SFU Library.

Raw data is seldom available without paying a fee. SFU's Research Data Library does provide access to raw data that requires the use of statistical analysis software, like SPSS, to create meaning from the numbers.

NOTE: APA does not cover Statistics Canada publications specifically. However, you can adapt the examples to the APA style of formatting and punctuation.

For example, the Statistics Canada How to Cite Statistics Canada products provides the following example:

17) Citing an article in The Daily / HTML
Statistics Canada. 2004. “Crime statistics.” The Daily. Released July 28, 2004. Statistics Canada Catalogue no. 11-001-XIE.
http://www.statcan.gc.ca/daily-quotidien/040728/tdq040728-eng.htm (accessed August 16, 2005).

To use APA style, you could adapt the order of the citation elements and punctuation, as follows. Please note that the second line of a citation in APA style should be indented.

Statistics Canada. (2004, July 28). Crime Statistics. The Daily. (Statistics Canada Catalogue no. 11-001-XIE) . Retrieved from http://www.statcan.gc.ca/daily-quotidien/040728/tdq040728-eng.htm

If you have questions about how to do this, Ask a Librarian.

Printer-friendly version
  • Variable and Its Type
  • Graphs for a Categorical Variable
  • Graphs for a Single Quantitative Variable
    • Dot Plot
    • Frequency Histogram and Relative Frequency Histogram
    • Stem-and-Leaf Diagram
    • Time Plot
    • Boxplot or Box-and-Whisker Plot

An Introduction to Lesson 1.3
by Course Authow Mosuk Chow - (length 6:23)


[summary transcript - histogram - bar chart]

Distinguishing between categorical (qualitative) variables and quantitative variables is a basic and intergral part of applied statistics as the methods to analyze these data are very different. Sometimes, when one codes surveys, you would code male as 1 and female as 2. Beware, gender is qualitative: there are two different classes. 1 and 2 just denote two different symbols for gender and there is no ordering between these two symbols when used to denote male and female. Another example is team assignments. For your team project, I will call the teams: Team 1 , Team 2 etc. The team a student belongs to is again qualitative. In statistics, as in most languages, we sometime call the same thing by different names. So qualitative is also called nominal, or categorical.

How can one graph qualitative variables?  Two common choices are pie chart and bar chart. Please pay attention that even though histogram also have bars sticking up, they are used to describe the frequency for quantitative variables; bar chart is reserved to describe graphs that show frequency of categorical variables.

 

You will practice drawing graphs for these two different types of variables.  Again, you will be asked in this lesson to work these examples out by hand. After a good understanding of these concepts has been established, the course will review all of these using the Minitab statistical software.

 

In Lesson 1.2, it is important to learn to distinguish between qualitative variable and quantitative variable as the methods to analyze these data are very different. Sometimes, when one codes surveys, you would code male as 1 and female as 2. Beware, gender is qualitative: two different classes. 1 and 2 just denote two different symbols and there is no ordering between these two symbols when used to denote male and female. Another example is team assignments. For your team project, I will call the teams: Team 1 , Team 2 etc. The team a student belongs to is again qualitative. In statistics, as in most languages, we sometime call the same thing by different names. So qualitative is also called nominal, or categorical.

How can one graph qualitative variable, two common choices are pie chart and bar chart. Please pay attention that even though histogram also have bars sticking up, they are used to describe the frequency for quantitative variables. They are called histograms so that the name: bar chart is reserved to describe graphs that show frequency of qualitative variables.

You should practice drawing graphs for these two different types of variables using minitab as described in Lesson 1.3.

Reading Assignment
An Introduction to Statistical Methods and Data Analysis, (see course schedule)..

Techniques of describing data in ways to capture the essence of the information in the data are called descriptive statistics. To draw conclusions from data about the population is called inferential statistics.

 

Identifying Categorical and Quantitative Variables

One survey of 500 Penn State University students about their favorite sport to watch shows that 238 said Football, 126 said Basketball, 45 said Hockey, 46 said Others.

Think about the following, then click on the icon to thte left to display the statistical application example.

What is the variable of interest?

 

Variable of interest: PSU students' favorite sport to watch.

It is important that each observation for the variable falls into one and only one values. For the above example, the values are:

Football, Basketball, Hockey, Others.

It is important to distinguish between the following two types of variables since the methods to describe them and to do inferences about them are very different.

1. Qualitative (Categorical) : Data that serves the function of a name only. For example, for coding purposes, you may assign Male as 0, Female as 1. The numbers 0 and 1 stand only for the two categories and there is no order between them. Categorical values may be:

  • Binary – where there are two choices, e.g. Male and Female;  
  • Ordinal – where the names imply levels with hierarchy or order of preference, e.g. level of education  
  • Nominal – where no hierarchy is implied, e.g. political party affiliation.

 

Please provide one or more examples for qualitative variable:

Blood type: A, B, AB, O

Favorite sport to watch.

 

2. Quantitative: Data that takes on numerical values that has a measure of distance between them. Quantitative values can be discrete,  or “counted” as in the number of people in attendance, or continuous or “measured” as in the weight or height of a person.

Please provide one or more examples for quantitative variable:

Height of a STAT 500 student, weight of a STAT 500 student, number of jeans a STAT 500 student owns.

 

Additional examples of bothinclude

  • Number of females in this class (Quantitative, Discrete)
  • Nationality (Categorical, nominal)
  • Amount of milk in a 1 gallon container (Quantitative, Continuous)
  • Sex of students (even if coded as M = 0, F = 1) (Categorical, Binary)

 

Graphs for a Categorical Variable

1. Pie Chart: area of the pie represents the percentage of that category.

Example

A hand drawn pie chart to represent the Penn State University student's favorite sport to watch. (We will use Minitab to draw graphs and charts in Lesson 3).

Remarks:

a) Pie charts may not be suitable for too many categories. Thus, if there are too many categories, you can either combine some categories or use a bar chart to represent the data.  What is mean by "too many"?  There is no clear cut off, more of just a judgment on the appearance.

b) Readers may find the pie chart more useful if the percentages are arranged in a descending or ascending order.

2. Bar Chart: The height of the bar for each category is equal to the frequency (number of observations) in the category. Leave space in between the bars to emphasize that there is no ordering in the classes.

Example

A hand drawn bar chart to represent the Penn State University student's favorite sport to watch.

Graphs for a Single Quantitative Variable

1. Dotplot: Useful to show the relative positions of the data.

Example

Each of the ten children in the second grade was given a reading aptitude test. The scores were as follows:

95

78

69

91

82

76

76

86

88

79

Here is a dot plot for the data.

2. Frequency Histogram and Relative Frequency Histogram: If there are many data points and we would like to see the distribution of the data, we can represent the data by a frequency histogram or a relative frequency histogram.

Group the data into about 5-20 class intervals and show the frequency or relative frequency of data in each interval.

Example

Jessica weighs herself every Saturday for the past 30 weeks

135

137

136

137

138

139

140

139

137

140

142

146

148

145

139

140

142

143

144

143

141

139

137

138

139

136

133

134

132

132

For histograms, we usually want to have from 5 to 20 intervals. Since the data range is from 132 to 148, it is convenient to have a class of width 2 since that will give us 9 intervals :

131.5 - 133.5

133.5 - 135.5

135.5 - 137.5

137.5 - 139.5

139.5 - 141.5

141.5 - 143.5

143.5 - 145.5

145.5 - 147.5

147.5 - 149.5

The reason that we choose the end points as .5 is to avoid confusion whether the end point belongs to the interval to its left or the interval to its right. An alternative is to specify the end point convention. For example, Minitab includes the left end point and excludes the right end point. Having the intervals, one can construct the frequency table and then draw the frequency histogram or get the relative frequency histogram to construct the relative frequency histogram. The following histogram is produced by Minitab when we specify the midpoints for the definition of intervals according to the intervals chosen above.

If we do not specify the midpoint for definition of intervals, Minitab will default to choose another set of class intervals resulting in the following histogram. According to the include left and exclude right end point convention, the observation 133 is included in the class 133-135.

Note that different choices of class intervals will result in different histograms. Relative frequency histograms are constructed in much the same way as a frequency histogram except that the vertical axis represents the relative frequency instead of the frequency. For the purpose of visually comparing the distribution of two data sets, it is better to use relative frequency rather than a frequency histogram since the same vertical scale is used for all relative frequency--from 0 to 1.

3. Stem-and-Leaf Diagram: Group the data and still keep the number. One can recover the original data (except the order the data is taken) from the diagram.

The stem represents the major groupings of the data. The leaves represent the last digit. For example, the first value (also smallest value) is 132, with 13 as the stem and 2 as the leaf.

Stem-and-Leaf of weight of Jessica

N = 30
Leaf Unit = 1.0

3

13

223

5

13

45

11

13

667777

(7)

13

8899999

12

14

0001

8

14

2233

4

14

45

2

14

6

1

14

8

The above Stem-and-Leaf diagram can also be drawn by Minitab. The first column, called depths, are used to display cumulative frequencies. Starting from the top, the depths indicate the number of observations that lie in a given row or before. For example, the 11 in the third row indicates that there are 11 observations in the first three rows. The row that contains the middle observation is denoted by having a bracketed number of observation in that row; (7) for our example. We thus know that the middle value lies in the fourth row. The depths following that row indicate the number of observations that lie in a given row or after. For example, the 4 in the seventh row indicates that there are four observations in the last three rows.

4. Boxplot: The boxplot will be discussed in greater detail when we discuss "Summarizing Data" because the design of the boxplot is dependent upon various summary measures we will learn in that lesson.

5. Time Plot: Note that for the weight of Jessica, one important aspect of the data is lost if one just shows the distribution. Jessica may be really interested in how her weight changes over time. For that purpose, a plot of weight versus the order it is taken (time) is warranted.

0 Replies to “Graphical Representation Of Data In Statistics Assignment 1”

Lascia un Commento

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *