The following are full solutions to chapter 2 in “Statistics for Business and Economics” By Newbold, Carlson & Thorne. I do not guarantee the correctness of any of the answers presented here. If you found a mistake, have a comment, or would like to ask me anything, I’m available by mail: me (at) shayacrich (dot) com.

The algorithm for calculating percentiles and quartiles presented in this chapter does not produce the same results as does Google Spreadsheet, and it appears that the same issue arises with MS Excel. I wrote the following Google Spreadsheet formulas for correctly calculating quartiles (assuming the dataset is in column A):

Q1: =if(MOD(count(A:A) + 1, 4)=0, indirect("A"&(count(A:A) + 1)/4), indirect("A"&rounddown((count(A:A) + 1)/4)) + 0.25*(indirect("A"&roundup((count(A:A) + 1)/4)) – indirect("A"&rounddown((count(A:A) + 1)/4))))

Q3: =if(MOD(count(A:A) + 1, 4)=0, indirect("A"&(count(A:A) + 1)*0.75), indirect("A"&rounddown((count(A:A) + 1)*0.75)) + 0.75*(indirect("A"&roundup((count(A:A) + 1)*0.75)) – indirect("A"&rounddown((count(A:A) + 1)*0.75))))

I also found an answer to someone that presented a similar issue, that suggests different (though similar) formulas: https://superuser.com/a/343368

2.1.a. The mean is 66, the median 75, and there’s no mode.

2.1.b. Because of the small outlier contained in the dataset, the median is the best choice for predicting future weekly specials. However, for examining past performances, such as gross revenue as a factor of weekly specials, the mean is still the better choice.

2.2.a.12

2.2.b. 13

2.2.c. 8

2.3.a. 3.5

2.3.b. 3.55

2.3.c. 3.7

2.4.a. 5.94

2.4.b. 6.35

2.5.a. 17.75

2.5.b. 20.74

2.6.a. Mean is 10.1, the median is 10.5, and the mode is 11

2.6.b. 6 < 7.25 < 10.5 < 12.75 < 14

2.7. The mode and median are both zero, and the mean is 0.44.

2.8.a. 25.58

2.8.b. 22.5

2.8.c. 22

2.9.a. Q1 = 2.9825, Q3 = 3.3675

2.9.b. 3.1

2.9.c. 3.39

2.10.a.8.54

2.10.b. 9

2.10.c. Comparison of mean and median suggest that this is a skewed left distribution, but this is not accurate because this isn’t a continuous unimodal dataset. A visual examination of a histogram that represents this dataset indicates that the dataset is right skewed. The positive value of skewness confirms that this is the case.

2.10.d. 2 < 6 < 9 < 10.75 < 21

2.11.a. The mean volume is 236.99, which for 100 bottles means that the volume of the entire sample is 23,699, a small fraction less than the advertised 237 mL.

2.11.b. The median volume is 237.

2.11.c. It’s difficult to tell the skewness from the same of the histogram, and different sample widths might visually indicate different results. The calculated skewness is 0.13, which confirms that the distribution is almost symmetric, although, being positive, we have to conclude that it is slightly skewed to the right.

2.11.d. 224 < 233.25 < 237 < 241 < 249

2.12.

s^2 = 5.14

s = 2.27

2.13.

s^2 = 20.3

s = 4.50

2.14. 17.57%

2.15.a. 28.77

2.15.b. 12.70

2.15.c. 44.15%

2.16.

Stem | Leaf |

1 | 2,3,4,5,7,8,9 |

2 | 0,1,2,3,7,9 |

3 | 1,3,5,8 |

4 | 0,2,5,9 |

5 | 3 |

6 | 5 |

IQR = 38 – 18 = 20

2.17.a. Trick question. The variance is 25 and not the standard deviation, so the standard deviation is 5, and we need k=2 so that the interval would be [75 – 2*5, 75 + 2*5]. For k=2, Chebyshev’s theorem gives us: [1 – (1/(2^2)] * 100% = 75%.

2.17.b. Approximately 95% of observations are between 65 and 85.

2.18. The question says population, which implies the use of the empirical rule.

- Almost all observations, [230 – 3 * 20, 230 + 3 * 20]
- Approximately 95%, [230 – 2 * 20, 230 + 2 * 20]

2.19.a. The 68% that’s between [425, 475] + half the observations within [400, 500] + half the observations within [425, 525], so in total: (68 + 13.5 + 2.5)% = 84%

2.19.b. Everything within [400, 500] + half of the observations within [375, 525], so (95 + 2.5)% = 97.5%

2.19.c. Almost none.

2.20.a. Common stocks have a mean annual percentage return of 8.16%. It should be noted that the real annual growth should be calculated with a geometric mean, but this is irrelevant to this exercise. U.S. Treasury Bills have a mean annual percentage return of 5.78%. From the perspective of annual returns alone (disregarding the risk of fluctuations), common stocks are a better investment, according to past performance.

2.20.b The standard deviation for the annual percentage return on common stocks is approximately 22.30%. The standard deviation for the annual percentage return on U.S. Treasury Bills is approximately 1.47%. It appears that the standard deviation on stocks is much higher, implying higher risk of fluctuations, that could partly explain the higher returns. We’ll need to examine the coefficients of variance to determine which investment is more worthwhile. The coefficient of variance on stocks is 273.41%, whereas that of treasury bills is 25.43%. This reinforces our assumption that stocks are riskier.

2.21.a. 26.8

2.21.b. 8.48266

2.21.c. 8.48266

2.21.d. 8.48266

2.21.e. 31.65%

2.22.a. The range is 0.54, the variance is 0.010, and the standard deviation is 0.10. Higher accuracy is hard to reach with Google Spreadsheet because of floating point calculation errors. My standard deviation was 0.1017 and my variance was 0.01034, the STDEV output was 0.1024 and the VAR output was 0.01048.

2.22.b.The IQR is 0.13. Given that it’s significantly lower than the range, we conclude that the dataset has either very high or very low outliers.

2.22.c. The coefficient of variation is 2.67%.

2.23.a. The mean is 261.0545

2.23.b. The variance is 306.4373 and the standard deviation is 17.5053

2.23.c. The coefficient of variation is 6.7%.

2.24.a. The standard deviation is 1.0048

2.24.b. According to Chebyshev’s theorem we know that at least 75% lie within 2 standard deviations of the mean. However, because the standard deviation is very small relative to the mean, we can assume that it’s closer to the empirical rule’s 95%.

2.25. The mean is 52.64 and the standard deviation is 12.7147

2.26.a. 4.2

2.26.b. 4.5833

2.27.a. 101

2.27.b. The sample variance is 4195 and the sample standard deviation is 64.76.

2.28.

# of Hours | fi | mi | fi*mi | mi-mean | (mi-mean)^2 | fi*(mi – mean)^2 |

4 < 10 | 8 | 7 | 56 | -8.4 | 70.56 | 564.48 |

10 < 16 | 15 | 13 | 195 | -0.4 | 0.16 | 2.4 |

16 < 22 | 10 | 19 | 190 | 3.6 | 12.96 | 129.6 |

22 < 28 | 7 | 25 | 175 | 9.6 | 92.16 | 645.12 |

- Approximate mean = (56 + 195 + 190 + 175) / (8 + 15 + 10 +7) = 15.4
- Approximate variance = (564.48 + 2.4 + 129.6 + 645.12) / (8 + 15 + 10 + 7 – 1) = 34.4, and the approximate standard deviation is 5.86.

2.29. 3.2251

2.30.a. Mean = 1.4

2.30.b. Sample variance = 23.8710 and standard deviation = 4.8857.

2.31.a. 9.36

2.31.b. 8.9063

2.32.a. 11.025

2.32.b. 0.9195

2.33. Mean = 1.654, and standard deviation = 10.6850.

2.34.a. 261.5454

2.34.b. 2735.3564

2.34.c. The exact mean was 261.0545, and we see that the approximate mean, although very close, is not precise. The variance was 306.4373 and now it’s 2735.3564, which is indicative of the fact that the result is less precise.

2.35.a. 2.33

2.35.b. 0.9058

2.36.a. 1392.5

2.36.b. 0.9930

2.37.a. -45

2.37.b. -0.9

2.38.a. Cov = 4.2679

2.38.b. r = 0.1283

2.38.c. |r| < 2/sqrt(n), so there isn’t enough data to identify a linear correlation between the drug units and recovery times.

2.39.a. Cov = -5.5, r = -0.7760

2.39.b. |r| > 2/sqrt(n), so there is a linear correlation and it is negative, meaning that the higher level of service results in lower delivery times.

2.40.a. Cov = -20.75

2.40.b. r = -0.9366

2.41.a. 2.9072

2.41.b. 0.9617

2.42. r = 0.9300

2.43. Cov = 9.9642, r = 0.9852

2.44.a. mean = 18.1325

2.44.b. s^2 = 204.7017, s = 14.3074

2.45.a. 43.1

2.45.b. s = 10.1644

2.45.c. 20 < 35 < 45 < 50 < 60

2.46.

Location 2 variation: 52.62222222

Location 2 standard deviation: 7.254117605

Location 3 variation: 75.82222222

Location 3 standard deviation: 8.707595663

Location 4: variation: 22.27777778

Location 4 standard deviation: 4.719934086

2.47. Cov = 5.18954, r = 0.24499, there is no linear correlation.

2.48.a.

2.48.b. r = 0.5602

2.49. I mistakenly assumed that population #3 would be smaller than population #1. The real variances are:

Population #1: 6

Population #2: 14

Population #3: 7.14

Population #4: 54

2.50.a. [295 – 1.59*63, 295 + 1.59*63]

2.50.b. [295 – 2.5*63, 295 + 2.5*63]

2.51.a. [9.2 – 2.5*3.5, 9.2 + 2.5*3.5]

2.51.b. [9.2 – 3.5, 9.2 + 3.5]

2.52.a. [29,000 – 2*3000, 29,000 + 2*3000]

2.52.b. [29,000 – 2*3000, 29,000 + 2*3000]

2.53.a. IQR = 21.5, so if the dataset is bell-shaped, most of the employees completed the task within the same range of ~20 seconds.

2.53.b. 222 < 249.5 < 263 < 271 < 299

2.54.a. The mean is 41.6826.

2.54.b. s^2 = 284.3546, s = 16.8628.

2.54.c. 95th percentile = 70

2.54.d. Five number summary: 18 < 25.75 < 39 < 54.25 < 73

2.54.e. CV = 40.45%

2.54.f. 100*[1 – (1/k^2)]% = 90% => k = sqrt(10) = ~3.16, so we take k = 3.17, and the result is:

[41.68 – 3.17*16.86, 41.68 + 3.17*16.86]

2.55.a. Cov = 16.55

2.55.b. r = 0.8653

2.56.a. Cov = 106.9333

2.56.b. r = 0.9887