Introduction to Statistics Final Milestone

Discipline: Statistics

Type of Paper: Question-Answer

Academic Level: Undergrad. (yrs 3-4)

Paper Format: APA

Pages: 1 Words: 1000

Question

Introduction to Statistics Final Milestone


James participated in an archery competition. He was allowed four attempts and was supposed to hit the bullseye in the center of the board.


If the figure shows the positions of James' arrows, which of the following would best classify the arrangement of arrows?

·        

High accuracy and high precision

·        

Low accuracy and low precision

·        

High accuracy and low precision

·        

Low accuracy and high precision

RATIONALE

The arrows are close to the center so they are accurate and they are also close to one another, so they are precise as well.

CONCEPT

 Accuracy and Precision in Measurements

2

The formula for the standard deviation of a sample is:

Select the true statement for the following data set that has a mean of 8:

4, 6, 6, 6, 9, 9, 12, 12

Answer choices are rounded to the hundredths place.

·        

The variance is 2.98 and the standard deviation is 8.86.

·        

The variance is 8.86 and the standard deviation is 2.98.

·        

The variance is 7.50 and the standard deviation is 2.98.

·        

The variance is 8.86 and the standard deviation is 7.50.

RATIONALE

We can first calculate the variance of the data, , by using the part of the formula under the square root:



Next, we can find the standard deviation, 
, by simply taking the square root of the variance:



We can also use the statistical functions in Excel to quickly find the variance and standard deviation.

CONCEPT

 Standard Deviation

3


What is the probability of drawing a red card or a queen from a standard deck of 52 cards?

·        

·        

·        

·        

RATIONALE

Since it is possible for a card to be both red and a queen, these two events are overlapping.  We can use the following formula:

In a standard deck of cards, half of the 52 cards are red, so .  There is a total of 4 Queens, so .  Of the 4 queens, 2 are red and 2 are black, so .

CONCEPT

 "Either/Or" Probability for Overlapping Events

4

In which of these cases should the median be used?

·        

When data has no outliers

·        

When the data has nominal values

·        

When the data has small variance

·        

When the data has extreme values

RATIONALE

Since the mean uses the actual values in the data, it is most affected by outliers and skewness.  So, we only want to use the mean when the data is symmetric as a measure of centrality.  When the data is skewed or has extreme values,
the median is a better measure since it is not as sensitive to these values.

CONCEPT

 Measures of Center

5


Eric is randomly drawing cards from a deck of 52. He first draws a red card, places it back in the deck, shuffles the deck, and then draws another card.

What is the probability of drawing a red card, placing it back in the deck, and drawing another red card?  Answer choices are in the form of a percentage, rounded to the nearest whole number.

·        

25%

·        

13%

·        

4%

·        

22%

RATIONALE

Since Eric puts the card back and re-shuffles, the two events (first draw and second draw) are independent of each other.  To find the probability of red on the first draw and second draw, we can use the following formula:

Note that the probability of drawing a red card is  or  for each event.

CONCEPT

 "And" Probability for Independent Events

6

A coin is tossed 50 times, and the number of times heads comes up is counted.

Which of the following statements about the distributions of counts and proportions is FALSE?

·        

The count of getting heads from a sample proportion of size 20 can be approximated with a normal distribution.

·        

The distribution of the count of getting heads can be approximated with a normal distribution.

·        

The distribution of the count of getting tails can be approximated with a normal distribution.

·        

The count of getting heads is a binomial distribution.

RATIONALE

If we look at the counts from a large population of success and failures (2 outcomes), this is called a binomial distribution, not a normal distribution.

CONCEPT

 Distribution of Sample Proportions

7

Select the correct statement regarding experiments. 

·        

A researcher can carefully control the explanatory variables but not observe human responses.

·        

A researcher can carefully control the explanatory variables and observe human responses.

·        

A researcher can observe the explanatory variables but not control human responses.

·        

A researcher can ignore explanatory variables and observe human responses.

RATIONALE

The defining part of experimental setting is that the researcher can control the setting and apply some treatment to observe how it affects an outcome of interest.  The responses by the participants are not controlled by the researcher.

CONCEPT

 Observational Studies and Experiments

8

The blood bank at a hospital has 1,200 units of blood, out of which 37% units are of blood group B+. A clinical researcher randomly selects 300 units of blood and finds that 33% of those are of blood group B+. To test his result, he randomly selects 200 units of blood and finds that 40% of those are of blood group B+.

Which of the following is the reason there is a difference between the two percentages selected by the researcher?

·        

Both samples suffered from non-response bias.

·        

The samples were not random samples.

·        

The sample sizes were both too small.

·        

Random error; the numbers were different due to variability inherent in sampling.

RATIONALE

When sampling, there is always some variability that occurs.  So, although the sample values are different, since they were randomly chosen, the differences are simply due to the variability that comes from sampling and not due to some systematic bias.

CONCEPT

 Random and Systematic Errors

9

A survey result shows that cell phone usage among teenagers rose from 63% in 2006 to 71% in 2008.

Of the following choices, which statement about cell phone use among teenagers is true?

·        

Cell phone usage rose by 11.2 percentage points.

·        

Cell phone usage rose by 8%.

·        

Cell phone usage rose by 12.7%.

·        

Cell phone usage rose by 12.7 percentage points.

RATIONALE

We can note that the absolute difference between 2006 and 2008 is 63% to 71% or 8 percentage points. 

To get the percent difference we take the absolute difference and divide by the initial value:


So we can say cell phone usage rose by 12.7%.

CONCEPT

 Using Percentages in Statistics

10

Which of the following data types will be continuous? 

·        

The number of children younger than ten that visited a planetarium last week

·        

The total weight of apples harvested in the farm in a season

·        

The number of cars in 100 households

·        

The letter grades students received on a class quiz

RATIONALE

The total weight of apples can take on any value and is therefore continuous.  The other measures can only take on a limited number of values and are discrete.

CONCEPT

 Discrete vs. Continuous Data

11

Which of the following is NOT a guideline for establishing causality?

·        

Look for cases where correlation exists between the variables of a scatterplot.

·        

Keep all variables the same to get duplicate results.

·        

Take into consideration all the other possible causes.

·        

Perform a randomized, controlled experiment. 

RATIONALE

For causality, the association should be something we observe in slightly varied conditions.  So if all variables and conditions are the same, this is not a way to support causality.

CONCEPT

 Establishing Causality

12

Select the statement that correctly describes a normal distribution.

·        

It is a negatively skewed distribution, as the extreme values are less than the median.

·        

It is a uniform distribution, as all of the values have equal frequency. 

·        

It is a symmetric distribution, as the mean and the median are the same.

·        

It is a positively skewed distribution, as the extreme values are greater than the median.

RATIONALE

A normal distribution is a bell-shaped and symmetric distribution.  So it has a smooth peak, which tells us the mean and median are the same.

CONCEPT

 Normal Distribution

13

A shoe retailer decides to record the styles and sizes of shoes that his customers choose. He records this data for an entire year by keeping track of his customers' purchases.

Which statement accurately describes the type of data the shoe retailer is collecting?

·        

The shoe retailer is receiving raw data on shoe sizes and styles from nearby shoe companies.

·        

The shoe retailer is gathering available data because customers tell him which shoe sizes and styles they prefer.

·        

The shoe retailer is receiving available data on shoe sizes and styles from nearby shoe companies.

·        

The shoe retailer is gathering raw data because he is recording shoe sizes and styles by himself.

RATIONALE

Since the retailer is gathering the data himself, this would be an example of raw data. 

CONCEPT

 Data

14

Rhonda is wondering if there is an association between the number of hours she studies per week and the number of semester credits she is enrolled in. The information is shown in the table below.


If Rhonda is taking four credits for the fall semester, how many hours per week will she study?

·        

2

·        

4

·        

8

·        

5

RATIONALE

If we use the scatterplot and note the value above 4 credit hours on the horizontal axis, we find this value is also 4 on the vertical axis.  She should expect to study 4 hours.

CONCEPT

 Scatterplot

15

What value of z* should be used to construct an 88% confidence interval of a population mean? Answer choices are rounded to the thousandths place.

·        

1.555

·        

1.645

·        

1.175

·        

1.220

RATIONALE

Using the z-chart to construct an 88% CI, this means that there is 6% for each tail.  The lower tail would be at 0.06 and the upper tail would be at (1 - 0.06) or 0.94.  The closest to 0.94 on the z-table is between 0.9394 and 0.9406.  

0.9394 corresponds with a z-score of 1.55.
0.9406 corresponds with a z-score of 1.56.

Taking the average of these two scores, we get a z-score of 1.555.

CONCEPT

 Confidence Intervals

16

A credit card company surveys 125 of its customers to ask about satisfaction with customer service. The results of the survey, divided by gender, are shown below.

Males

Females

Extremely Satisfied

25

7

Satisfied

21

13

Neutral

13

16

Dissatisfied

9

14

Extremely Dissatisfied

2

5


If you were to choose a female from the group, what is the probability that she is satisfied with the company's customer service? Answer choices are rounded to the hundredths place. 

·        

0.38

·        

0.62

·        

0.24

·        

0.13

RATIONALE

The probability of a person being "satisfied" given she is a female is a conditional probability.  We can use the following formula: 

Remember, to find the total number of females, we need to add all values in this column: 7 + 13 + 16 + 14 + 5 = 55. 

CONCEPT

 Conditional Probability and Contingency Tables

17

Joe is measuring the widths of doors he bought to install in an apartment complex. He measured 72 doors and found a mean width of 36.1 inches with a standard deviation of 0.3 inches. To test if the doors differ significantly from the standard industry width of 36 inches, he computes a z-statistic. 

What is the value of Joe's z-test statistic?

·        

2.83

·        

-1.81

·        

1.81

·        

-2.83

RATIONALE

If we first note the denominator of 
Then, getting the z-score we can note it is 

This tells us that 36.1 is 2.83 standard deviations above the value of 36. 

Note that when you round some values you may get slightly different results, but the results should be relatively close to this final calculated value.

CONCEPT

 Z-Test for Population Means

18

The scatterplot below charts the performance of an electric motor.


Which answer choice correctly indicates the explanatory variable and the response variable of the scatterplot?

·        

Explanatory variable: Rotation

Response variable: Voltage

·        

Explanatory variable: Rotation

Response variable: Electric motor

·        

Explanatory variable: Voltage

Response variable: Electric motor

·        

Explanatory variable: Voltage

Response variable: Rotation

RATIONALE

The explanatory variable is what is along the horizontal axis, which is voltage.  The response variable is along the vertical axis, which is speed of rotation.

CONCEPT

 Explanatory and Response Variables

19

Which of the following situations describes a continuous distribution? 

·        

A probability distribution of the average time it takes employees to drive to work.

·        

A probability distribution showing the number of pages employees read during the workday.

·        

A probability distribution showing the number of minutes employees spend at lunch. 

·        

A probability distribution of the workers who arrive late to work each day.

RATIONALE

For a distribution to be continuous, there must be an infinite number of possibilities.  Since we are measuring the time to drive to work, there are an infinite number of values we might observe, for example: 2 hours, 30 minutes, 40 seconds, etc.

CONCEPT

 Probability Distribution

20

A market research company conducted a survey of two groups of students from different schools. They found that students from school A spent an average of 90 minutes studying daily, while the students from school B spent an average of 75 minutes daily.

They want to find out if the difference in the mean times spent studying by the students of the two schools is statistically significant.

Which of the following sets shows the correct null hypothesis and alternative hypothesis?

·        

Null Hypothesis:  There is no difference in the mean times spent by the schools' students.
Alternative Hypothesis: There is at least some difference in the mean times spent by the schools' students.

·        

Null Hypothesis:  School B students spend more time studying than School A. 
Alternative Hypothesis: The difference in the mean times spent by the schools' students is 15 minutes.

·        

Null Hypothesis: The difference in the mean times spent by the schools' students is 15 minutes.
Alternative Hypothesis: There is no difference in the mean times spent by the schools' students.

·        

Null Hypothesis: There is at least some difference in the mean times spent by the schools' students.
Alternative Hypothesis: The students from school B spend more time studying than the students from school A.

RATIONALE

Recall that the null hypothesis is always of no difference.

So the null hypothesis (Ho) is that the mean time studying for group A = mean for group B.  This would indicate no difference between the two groups.

The alternative hypothesis (Ha) is that there is difference in the mean study time between the two groups.

CONCEPT

 Hypothesis Testing

21

Jerry, Stein, Johnson, and Mary had a competition to see who could profit the most off of their odd jobs during the summer. They discussed their earnings on the first day of school. Afterward, each of them decided to make bar graphs to plot the different amounts they earned


Who made the above graph, and why?

·        

Mary, because she wanted to make the amount made by each person appear reasonably close.

·        

Jerry, because he wanted to accurately show the amount made by each person.

·        

Johnson, because he wanted to make the amount made by each person appear very different.

·        

Stein, because he wanted to make it look like he earned significantly more than the others.

RATIONALE

Since there was a competition, the person who most likely made this graph would want to represent themselves favorably.  Since Stein has the most sales, it would probably be Stein.

CONCEPT

 Misleading Graphical Displays

22

Ralph records the time it takes for each of his classmates to run around the track one time. As he analyzes the data on the graph, he notices very little variation between his classmates’ times.

Which component of data analysis is Ralph observing?

·        

The overall spread of the data

·        

The center of the data set

·        

An outlier in the data set

·        

The overall shape of the data

RATIONALE

Since Ralph is looking at the variation of data, this is examining the spread of the data.

CONCEPT

 Data Analysis

23

Kyle was trying to decide which type of soda to restock based on popularity: regular cola or diet cola. After studying the data, he noticed that he sold less diet cola on weekdays and weekends. However, after combing through his entire sales records, he actually sold more diet cola than regular cola.

Which paradox had Kyle encountered?

·        

False Negative

·        

Simpson's Paradox

·        

Benford's Law

·        

False Positive

RATIONALE

This is an example of Simpson's paradox, which is when the trend overall is not the same that is examined in smaller groups.  Since the sale of diet coke overall is larger but this trend changes when looking at weekend/weekday, this is a reversal of the trend.

CONCEPT

 Paradoxes

24

Select the statement that correctly describes a Type II error. 

·        

A Type II error occurs when the null hypothesis is rejected when it is actually false.

·        

A Type II error occurs when the null hypothesis is accepted when it is actually true.

·        

A Type II error occurs when the null hypothesis is rejected when it is actually true.

·        

A Type II error occurs when the null hypothesis is accepted when it is actually false.

RATIONALE

Recall a Type II error is when we incorrectly accept a false null hypothesis.  In this case, we want to reject H and conclude there is evidence Hₐ is correct.

CONCEPT

 Type I/II Errors

25

Jesse takes two data points from the weight and feed cost data set to calculate a slope, or average rate of change. A rat weighs 3.5 pounds and costs $4.50 per week to feed, while a Beagle weighs 30 pounds and costs $9.20 per week to feed.

Using weight as the explanatory variable, what is the slope of the line between these two points? Answer choices are rounded to the nearest hundredth.

·        

$0.18 / lb.

·        

$0.31 / lb.

·        

$5.64 / lb.

·        

$1.60 / lb.

RATIONALE

In order to get slope, we can use the formula: .  

Using the information provided, the two points are: (3.5 lb., $4.50) and (30 lb., $9.20).  We can note that: