Solutions for Chapter 3 of Statistics for Business & Economics

The following are full solutions to chapter 3 in “Statistics for Business and Economics” By Newbold, Carlson & Thorne. I do not guarantee the correctness of any of the answers presented here. If you found a mistake, have a comment, or would like to ask me anything, I’m available by mail: me (at) shayacrich (dot) com.

3.1. $$\overline{A}=[E_2, E_4, E_5, E_7, E_8, E_{10}]$$
3.2.a. $$A \cap B = [E_3, E_9]$$
3.2.b. $$A \cup B = [E_1, E_2, E_3, E_7, E_8, E_9]$$
3.2.c. It’s not, because \(E_4\) to \(E_6\), as well as \(E_{10}\) are not covered.

3.3.a. $$A \cap B = [E_4, E_5, E_6, E_{10}]$$
3.3.b. $$A \cup B = [E_1, E_2, E_4, E_5, E_6, E_7, E_8, E_{10}]$$
3.3.c. It’s not, because \(E_3\) and \(E_{10}\) are missing.

3.4.a. $$A \cap B = [E_3, E_6]$$
3.4.b. $$A \cup B = [E_3, E_4, E_5, E_6, E_9, E_{10}]$$
3.4.c. It’s not.

3.5.a. \(\overline{A}\) is the event “it will be 4 or less days before the machinery becomes available”.
3.5.b. \(A \cap B\) is the event it will be 5 days before the machine becomes available.
3.5.c. \(A \cup B\) is collectively exhaustive (any number of days before the machinery becomes available.
3.5.d. \(A \cap B\) is not the empty set, but rather contains the outcome of 5 days, so A and B are not mutually exclusive.
3.5.e. A and B are collectively exhaustive because any outcome is either below 6 or above 4.
3.5.f. According to Table 3.2, \(B \cap \overline{A} = B - (A \cap B)\), so \((A \cap B) \cup (\overline{A} \cap B) = (A \cap B) \cup (B - (A \cap B)) = B\).
An alternative demonstration is thus: \((A \cap B) \cup (\overline{A} \cap B) = (A \cup \overline{A}) \cap B = S \cap B = B\).
As for our specific A and B, \(A \cap B\) is only day 5, and \(\overline{A} \cap B\) is 4 days or less, so the union of the two is anything that’s less than 6, or B.
3.5.g. I can’t think of a mathematical way to describe this, but it’s clear intuitively, that \(\overline{A} \cap B\) is all of B, except for the intersection with A, and the union of that with A is all of B and all of A, so \(A \cup B\).
As for our specific A and B, \(\overline{A} \cap B\) is 4 days or less, and A is more than 4 days, so the union is the entire sample space, and \(A \cup B\) is collectively exhaustive so it too equals the entire sample space.

3.6.a. \(A \cap B\) matches \(O_1\) and \(\overline{A} \cap B\) matches \(O_3\), their union is exactly \(b = [O_1, O_3]\)
3.6.b. \(\overline{A} \cap B = [O_3]\) and \(A = [O_1, O_2]\), so the union is \(A \cup B = [O_1, O_2, O_3]\).

3.7.a. $$[(M_1, M_2), (M_1, M_3), (M_1, T_1), (M_1, T_2), (M_2, M_3), (M_2, T_1), (M_2, T_2), (M_3, T_1), (M_3, T_2), (T_1, T_2)]$$
3.7.b. $$A = [(M_1, T_1), (M_1, T_2), (M_2, T_1), (M_2, T_2), (M_3, T_1), (M_3, T_2), (T_1, T_2)]$$
3.7.c. $$B=[(M_1, M_2), (M_1, M_3), (M_2, M_3), (T_1, T_2)]$$
3.7.d. $$\overline{A}=[(M_1, M_2), (M_1, M_3), (M_2, M_3)]$$
3.7.e. \(A \cap B = [(T_1, T_2)]\), whereas \(\overline{A} \cap B = [(M_1, M_2), (M_1, M_3), (M_2, M_3)]\). Therefore, the union is all four outcomes that make up B.
3.7.f. \(\overline{A} \cap B = [(M_1, M_2), (M_1, M_3), (M_2, M_3)]\), and \(A = [(M_1, T_1), (M_1, T_2), (M_2, T_1), (M_2, T_2), (M_3, T_1), (M_3, T_2), (T_1, T_2)]\), so the union is \(S = [(M_1, M_2), (M_1, M_3), (M_2, M_3), (M_1, T_1), (M_1, T_2), (M_2, T_1), (M_2, T_2), (M_3, T_1), (M_3, T_2), (T_1, T_2)]\), whereas \(S = A \cup B\), so the two subsets match.

3.8. 35/66 = 0.53030303

3.9. 36/120 = 0.3

3.10. 675/1820 = 0.370879121

3.11. 20000/120000 = 0.166666667

3.12. 1/9 * 1/9 = 1/81

3.13.a. 0.68
3.13.b. 0.73
3.13.c. 0.32
3.13.d. 0.41
3.13.e. 1

3.14.a. 0.54
3.14.b. 0.18
3.14.c. Rate of return will be less than 10%.
3.14.d. 0.46
3.14.e. The empty set.
3.14.f. 0
3.14.g. Rate of return will be at least 10% or negative.
3.14.h. 0.72
3.14.i. Yes, because \(A \cap B = \emptyset\)
3.14.j. No, \(A \cup B\) do not cover 0% – 10%.

3.15.a. 0.5
3.15.b. 0.25
3.15.c. 0.25

3.16. \(A \cup B = [E_1, E_2, E_3, E_7, E_8, E_9]\), so the \(P(A \cup B) = 6/10 = 0.6\), whereas \(P(A) = P(B) = 4/10 = 0.4\), so \(P(A) + P(B) = 0.8\).

3.17.a. 0.86
3.17.b. 0.91
3.17.c. 0.14
3.17.d. 1
3.17.e. 0.77
3.17.f. No, any number of complaints between 1 to 9 falls under both.
3.17.g. Yes, \(P(A \cup B) = 1\)

3.18.a. 0.87
3.18.b. 0.35
3.18.c. The five classes cover all possible outcomes, and P(S) = 1.

3.19. $$P(A \cap B) = 0.25$$

3.20. $$P(A \cap B) = 0$$

3.21. $$P(A \cap B) = 0.24$$

3.22. $$P(A \cup B) = 0.75$$

3.23. \(P(A|B) = 0.67\). This does not equal P(A), so the events are not statistically independent.

3.24. \(P(A|B) = 0.8 = P(A)\), therefore, the events are statistically independent.

3.25. \(P(A|B) = 0.75\). The events are not statistically independent.

3.26. \(P(A|B) = 0.625\). The events are not statistically independent.

3.27. 1/9

3.28.a. 7! = 5040
3.28.b. 1/5040

3.29. 49*50 = 2450

3.30. 1/120

3.31. 0.2

3.32. N = 60, P = 1/60

3.33. 28

3.34.a. 42
3.34.b. 6
3.34.c. 6
3.34.d. 6/42, if considering only the lead role, the chance is 1/7, which is the same probability.
3.34.e. (6 + 6)/42 = 2/7. \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\), and \(P(A \cap B) = \emptyset\), so we can add up 1/7 + 1/7 to reach the same result.

3.35.a. 150
3.35.b. 40/150 = 0.27
3.35.c. If A is the event that the craftsman brother gets selected, and B is the event that the labourer brother gets selected, then \(P(A) = 4/10\), \(P(B) = 10/15\) and \(P(A \cap B) = 40/150\), so \(P(A \cup B) = 120/150\), and its complement is the answer \(\overline{P(A \cup B)} = 30/150\)

3.36.a. 90
3.36.b. We look at the complement. There are 10 combinations of U.S funds that won’t under-perform, and 3 combinations of such international funds. So the probability of the complement is 30/90, and the probability of the original event in question is 60/90, or 0.67.

3.37. $$P(A \cup B) = 0.3 + 0.25 - 0.2 = 0.35$$

3.38. $$P(A \cup B) = 0.3 + 0.2 - 0.15 = 0.35$$

3.39.a. We take ‘unsuccessful’ to mean ‘no immediate action’, and if we call that A, then \(P(A) = 0.95\), and the probability of 4 consecutive outcomes that belong to A is 0.81.
3.39.b. We’d like the probability of “at least 4 unsuccessful calls” where here “unsuccessful” means anything not leading to a donation at all. Anything after those 4 calls work, because the question states that we’re looking for the first successful call after “at least 4 unsuccessful” ones. So it’s the same answer whether there were 4 unsuccessful calls and then a successful one, and if there were 10 unsuccessful calls and only then a successful one. The probability of any type of donation happening is 0.1, so the answer is \(0.9^4 = 0.65\)

3.40. Because B and C are mutually exclusive, \(P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P(A \cap C)\), and because A is independent of both, \(P(A \cap B) = P(A)P(B)\), and \(P(A \cap C) = P(A)P(C)\), so \(P(A \cup B \cup C) = 0.069\)

3.41. For independent events, \(P(A \cap B) = P(A)P(B)\), so 0.833.

3.42.a. $$P(B|A) = \frac{P(A \cap B)}{P(B)}=\frac{0.1}{0.18}=0.555$$
3.42.b. $$P(A|B) = \frac{P(A \cap B)}{P(A)}=\frac{0.1}{0.12}=0.833$$

3.43. If A is “item is defective”, and B is “inspector accepted item”, then \(P(B|A) = 0.8, P(A \cap B) = 0.01\), so \(P(A) = 0.125\)

3.44. Let A be “analyst is successful at stocks” and let B be “analyst is successful at bonds”. \(P(A) = 1/12, P(B) = 1/20\). Because they’re independent, \(P(B \cap A) = P(A)P(B)=1/240\). So \(P(A \cup B) = 1/12 + 1/20 - 1/240 = 31/240\)

3.45. Let A be the event “loan is for a high risk client”, and B will be “loan in default”. So \(P(A) = 0.15, P(B) = 0.05, P(A \cap B) = 0.02\), and the answer is \(P(B|A) = 0.02/0.15 = 0.133\)

3.46.a. $$P(A \cup B) = 0.4 + 0.5 - 0 = 0.9$$
3.46.b. $$P(A \cup C) = 0.8 + 0.5 - 0.4*0.8 = 0.88$$
3.46.c. \(P(C|B) = \frac{P(B \cap C)}{P(B)} = 0.75\), so \(P(B \cap C) = 0.375\) and \(P(A \cup B) = 0.5 + 0.8 - 0.375 = 0.925\)

3.47. The number of combinations for events A and B are 10 each, so the probabilities are 0.1. The number of combinations for getting both A and B is 210, so the overall probability is: \(P(A \cup B) = P(A) + P(B) - P(A \cap B) = 0.1 + 0.1 - \frac{1}{210} = 0.95\), and the analyst’s claim is a bit presumptuous.

3.48.a. Let A be the event “occurred on Monday” and B be the event “occurred on the last hour of the shift”, so \(P(A) = 0.3, P(B) = 0.2, P(A \cap B) = 0.04, P(A) = P(A \cap B) \cup P(A \cap \overline{B}) \Rightarrow 0.3 = 0.04 + P(A \cap \overline{B}) \Rightarrow P(A \cap \overline{B}) = 0.26. P(\overline{B}|A) = \frac{P(A \cap \overline{B})}{P(A)} = \frac{0.26}{0.3} = 0.8667\)
3.48.b. \(P(A)P(B) = (0.3)(0.2) = 0.06 \neq 0.04 = P(A \cap B)\), so no.

3.49.a. Let A be the event “signed up for reading class” and B be the event “signed up for math class”. \(P(A) = 0.4, P(B) = 0.5, P(B|A) = 0.3, P(A \cap B) = P(B|A)P(A) = (0.3)(0.4) = 0.12\)
3.49.b. $$P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{0.12}{0.5} = 0.24$$
3.49.c. $$P(A \cup B) = P(A) + P(B) - P(A \cap B) = 0.4 + 0.5 - 0.12 = 0.78$$
3.49.d. \(P(A)P(B) = (0.4)(0.5) = 0.2 \neq 0.12 = P(A \cap B)\), so no.

3.50. Let A be “new customer”, and be be “used rival”, so \(P(A) = 0.15, P(B|A) = 0.8, P(B) = 0.6, P(A \cap B) = P(B|A)P(A) = (0.8)(0.15) = 0.12, P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{0.12}{0.6} = 0.2\)
3.51. $$P(B) = 0.2, P(A|B) = 0.8, P(A \cap B) = P(D) = 0.16, P(D \cap C) = P(A \cap B \cap C) = 0.02, P(C|D) = \frac{P(C \cap D)}{P(C)} = \frac{0.02}{0.16} = 0.125$$

3.52. 0.05

3.53. 0.05

3.54. 0.05

3.55. 0.20

3.56. $$\frac{0.05}{0.3} = 0.1667$$

3.57. $$\frac{0.1}{0.4} = 0.25$$

3.58. $$\frac{0.1}{0.25} = 0.4$$

3.59. 4

3.60. 1

3.61. $$\frac{P(A|B_1)}{P(A|B_2)} = \frac{0.8}{0.4} = 2$$

3.62. $$\frac{P(A|B_1)}{P(A|B_2)} = \frac{0.4}{0.2} = 2$$

3.63. $$\frac{P(A|B_1)}{P(A|B_2)} = \frac{0.2}{0.4} = 0.5$$

3.64.a. 0.12
3.64.b. $$\frac{P(A|B)}{P(B)} = \frac{0.19}{0.27} = 0.703$$
3.64.c. No, \(P(A)P(B) = (0.27)(0.79) = 0.2133 \neq 0.19 = P(A \cap B)\)
3.64.d. \(\frac{P(A|B)}{P(B)} = \frac{0.07}{0.21} = 0.33\)
3.64.e. No, \(P(A)P(B) = (0.19)(0.21) = 0.0399 \neq 0.07 = P(A \cap B)\)
3.64.f. 0.79
3.64.g. 0.27
3.64.h. $$P(A \cup B) = P(A) + P(B) - P(A \cap B) = 0.79 + 0.27 - 0.19 = 0.87$$

3.65.a. 0.3
3.65.b. 0.38
3.65.c. Let A be a high prediction and B be a high outcome, $$ P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{0.23}{0.38} = 0.605 $$
3.65.d. $$P(B|A) = \frac{P(A \cap B)}{P(A)} = \frac{0.23}{0.3} = 0.766$$
3.65.e. Let b be a low outcome, \(P(B|A) = \frac{P(A \cap B)}{P(A)} = \frac{0.01}{0.3} = 0.033\)

3.66.a. 0.25
3.66.b. 0.32
3.66.c. Let A be “traded” and B will be “never reads”, so $$P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{0.04}{0.25} = 0.16$$
3.66.d. $$P(A|B) = \frac{P(A \cap B)}{P(A)} = \frac{0.04}{0.32} = 0.125$$
3.66.e. Let B be “regularly reads the paper”, so \(P(B) = 0.34, P(\overline{B}) = 0.66, P(A \cap \overline{B}) = P(A) - P(A \cap B) = 0.32 - 0.18 = 0.14, P(A|\overline{B}) = \frac{P(A \cap \overline{B})}{P(\overline{B})} = \frac{0.14}{0.66} = 0.212\)

3.67.a. Let D be “defective”. Because A, B & C are mutually exclusive and collectively exhaustive, \(P(D) = P(D \cap A) \cup P(D \cap B) \cup P(D \cap C) = P(D \cap A) + P(D \cap B) + P(D \cap C) - P((D \cap A) \cap (D \cap B)) - P((D \cap A) \cap (D \cap C)) - P((D \cap B) \cap (D \cap C)) + P((D \cap A) \cap (D \cap B) \cap (D \cap C)) = 0.02 + 0.05 + 0.03 - 0 - 0 - 0 + 0 = 0.1\)
3.67.b. Let G be “good. Because G and D are mutually exclusive and collectively exhaustive, \(P(B) = P(G \cap B) \cup P(D \cap B) = P(G \cap B) + P(D \cap B) - P((G \cap B) \cap (D \cap B)) = 0.3 + 0.05 - 0 = 0.35\)
3.67.c. $$P(D|B) = \frac{P(D \cap B)}{P(B)} = \frac{0.05}{0.35} = 0.142$$
3.67.d. $$P(B|D) = \frac{P(D \cap B)}{P(D)} = \frac{0.05}{0.1} = 0.5$$
3.67.e. No, for example: $$P(A|D) = \frac{P(A \cap D)}{P(D)} = \frac{0.02}{0.1} = 0.2 \neq P(A) = 0.29$$ And the same goes for other intersections.
3.67.f. \(P(G|A) = 0.931, P(G|B) = 0.857, P(G|C) = 0.916\), so A.

3.68.a. 0.32
3.68.b. 0.25
3.68.c. Let W be “worked on additional problems”, so \(P(A|W) = \frac{P(A \cap W)}{P(W)} = \frac{0.12}{0.32} = 0.375\)
3.68.d. $$P(W|A) = \frac{P(A \cap W)}{P(A)} = \frac{0.12}{0.25} = 0.48$$
3.68.e. Let D be “expects a grade below C”. Because C and D are mutually exclusive, \(P(C \cup D) = P(C) + P(D) - P(C \cap D) = 0.38 + 0.1 - 0 = 0.48\), so \(P((C \cup D) \cap W) = P(C \cap W) \cup P(D \cap W) = P(C \cap W) + P(D \cap W) - P((C \cap W) \cap (D \cap W)) = 0.12 + 0.02 - 0 = 0.14, P((C \cup D)|W) = \frac{P((C \cup D) \cap W)}{P(W)} = \frac{0.14}{0.32} = 0.4375\)
3.68.f. No, for example: \(P(A|W) = \frac{P(A \cap W)}{P(W)} = \frac{0.12}{0.32} = 0.375 \neq P(A) = 0.25\), and the same goes for other intersections.

3.69.a. 0.77
3.69.b. 0.19
3.69.c. Let S be “single”, and L be “left the job within the year, so \(P(L|S) = \frac{P(L \cap S)}{P(S)} = \frac{0.06}{0.23} = 0.26\)
3.69.d. \(P(\overline{S}|\overline{L}) = \frac{P(\overline{S} \cap \overline{L})}{P(\overline{L})} = \frac{0.64}{0.81} = 0.79\)

3.70.a. 0.76 3.70.b. 0.77 3.70.c. 0.1

3.71.

Men Women
Joined0.0280.0540.082
Not Joined0.3720.5460.918
Total0.40.61

a. 0.082
b. Let W be “women” and J be “joined the club”, so \(P(W|J) = \frac{P(W \cap J)}{P(J)} = \frac{0.054}{0.082} = 0.658\)

3.72. Let G be “significant growth”, and I1, I2 & I3 will greater, similar and lower interest events.

I1 I2 I3
\(G\)0.0250.30.120.445
\(\overline{G}\)0.2250.30.030.555
Total0.250.60.151

a. 0.025
b. 0.445
c. \(P(I3|G) = \frac{P(I3 \cap G)}{P(G)} = \frac{0.12}{0.445} = 0.269\)

3.73. \(P(H) = 0.42, P(S) = 0.22, P(S|H) = 0.34\)
a. $$P(H \cap S) = P(S|H)P(H) = (0.34)(0.22) = 0.0506$$
b. $$P(H \cup S) = P(S) + P(H) - P(S \cap H) = 0.22 + 0.42 - 0.0506 = 0.5894$$
c. $$P(H|S) = \frac{P(H \cap S)}{P(S)} = \frac{0.0506}{0.22} = 0.23$$

3.74. Let U be the event of graduating at the top 10% of the class.

$$>Q_1$$ $$Q_2 \cup Q_3$$ $$Q_4$$
\(U\)0.1750.250.050.475
\(\overline{U}\)0.0750.250.20.525
Total0.250.50.251

a. 0.475 b. $$P(Q_1|U) = \frac{P(Q_1 \cap U)}{P(U)} = \frac{0.175}{0.475} = 0.368$$
c. We can reach this result in one of two ways: c.1. $$P(\overline{Q_1}|\overline{U}) = 1 - P(Q_1|\overline{U}) = 1 - \frac{P(Q_1 \cap \overline{U})}{P(\overline{U})} = 1 - \frac{0.075}{0.525} = 0.857$$
c.2. Because \(Q4\) and \((Q2 \cup Q3)\) are mutually exclusive, then \(P((Q_2 \cup Q_3) \cap \overline{U}) \cup P(Q_4 \cap \overline{U}) = P((Q_2 \cup Q_3) \cap \overline{U}) + P(Q_4 \cap \overline{U}) - P(((Q_2 \cup Q_3) \cap \overline{U}) \cap (Q_4 \cap \overline{U})) = P((Q_2 \cup Q_3) \cap \overline{U}) + P(Q_4 \cap \overline{U}) - 0 = P((Q_2 \cup Q_3) \cap \overline{U}) + P(Q_4 \cap \overline{U})\), so we can do: \(P(\overline{Q_1}|\overline{U}) = P(((Q_2 \cup Q_3) \cup Q_4)|\overline{U}) = \frac{P(((Q_2 \cup Q_3) \cup Q_4) \cap \overline{U})}{P(\overline{U})} = \frac{P((Q_2 \cup Q_3) \cap \overline{U}) \cup P(Q_4 \cap \overline{U})}{P(\overline{U})} = \frac{0.2 + 0.25}{0.525} = 0.857\)

3.75.a. Let H be “high sales” and F will be “favorable reaction”. \(P(H|F) = \frac{P(H \cap F}{P(F)} = \frac{0.173}{0.303} = 0.570957096\)
3.75.b. Let L be “low sales” and U will be “unfavorable reaction”. \(P(L|U) = \frac{P(L \cap U}{P(U)} = \frac{0.141}{0.272} = 0.518382353\)
3.75.c. Let N be ‘neutral”, because N and F are mutually exclusive: \(P(L \cap \overline{U}) = P(L \cap (N \cup F)) = P((L \cap N) \cup (L \cap F)) = P(L \cap N) + P(L \cap F) - P((L \cap N) \cap (L \cap F)) = P(L \cap N) + P(L \cap F) - 0 = P(L \cap N) + P(L \cap F)\) $$ P(L|\overline{U}) = \frac{P(L \cap \overline{U})}{P(\overline{U})} = \frac{P(L) - P(L \cap U)}{P(\overline{U})} = \frac{0.296 - 0.141}{0.272} = 0.569852941 $$
3.75.d. $$P(\overline{U}|L) = \frac{P(L \cap \overline{U})}{P(L)} = \frac{0.155}{0.296} = 0.523648649$$

3.76. Let M1 and M2 be the machines, M1 being the faulty one, and let F be the event of a faulty piece.

$$M_1$$ $$M_2$$
\(F\)0.0400.04
\(\overline{F}\)0.360.60.96
Total0.40.61

$$P(M_1|F) = \frac{P(M_1 \cap \overline{F})}{P(\overline{F})} = \frac{0.36}{0.96} = 0.375$$

3.77.a. Let E be the event of finding a course enjoyable, and let V be the event of receiving strong positive evaluations.
$$P(E|V)^3 = (\frac{P(E \cap V)}{P(V)})^3 = (\frac{P(E \cap V)}{P(V \cap E) \cup P(V \cap \overline{E})})^3 = (\frac{0.42}{0.42 + 0.075})^3 = (0.848)^3 = 0.6108$$
3.77.b. $$1 - P(\overline{E}|V)^3 = 1 - (\frac{P(\overline{E} \cap V)}{P(V)})^3 = 1 - (\frac{P(\overline{E} \cap V)}{P(V \cap E) \cup P(V \cap \overline{E})})^3 = 1 - (\frac{0.075}{0.42 + 0.075})^3 = 1 - (0.151)^3 = 0.996521691$$

For all the following exercises, it should be assumed that A1 and A2 are complements, and that B1 and B2 are complements.

3.78. $$P(A_2) = 0.6, P(A_1|B_1) = \frac{P(B_1|A_1)P(A_1)}{P(B_1|A_1)P(A_1) + P(B_1|A_2)P(A_2)} = \frac{0.24}{0.24 + 0.42} = 0.36$$

3.79. $$P(A_2) = 0.2, P(A_1|B_1) = \frac{P(B_1|A_1)P(A_1)}{P(B_1|A_1)P(A_1) + P(B_1|A_2)P(A_2)} = \frac{0.48}{0.48 + 0.04} = 0.923$$

3.80. $$P(A_2) = 0.5 \\ P(B_1) = P(B_1|A_1)P(A_1) + P(B_1|A_2)P(A_2) = (0.4)(0.5) + (0.7)(0.5) = 0.55 \\ P(B_2) = 1 - P(B_1) = 0.45\\ P(A_1 \cap B_1) = P(B_1|A_1)P(A_1) = (0.4)(0.5) = 0.2 \\ P(A_1) = P(A_1 \cap B_1) + P(A_1 \cap B_2) \Rightarrow P(A_1 \cap B_2) = P(A_1) - P(A_1 \cap B_1) = 0.5 - 0.2 = 0.3 \\ P(B_2|A_1) = \frac{P(A_1 \cap B_2)}{P(A_1)} = \frac{0.3}{0.5} = 0.6 \\ P(A_1|B_2) = \frac{P(B_2|A_1)P(A_1)}{P(B_2)} = \frac{(0.6)(0.5)}{0.45} = 0.67$$

3.81. $$P(A_2) = 0.6 \\ P(B_1) = P(B_1|A_1)P(A_1) + P(B_1|A_2)P(A_2) = (0.6)(0.4) + (0.7)(0.6) = 0.66 \\ P(B_2) = 1 - P(B_1) = 0.34 \\ P(B_1 \cap A_2) = P(B_1|A_2)P(A_2) = (0.7)(0.6) = 0.42 \\ P(A_2) = P(A_2 \cap B_1) + P(A_2 \cap B_2) \Rightarrow P(A_2 \cap B_2) = P(A_2) - P(A_2 \cap B_1) = 0.6 - 0.42 = 0.18 \\ P(A_2|B_2) = \frac{P(A_2 \cap B_2)}{P(B_2)} = \frac{0.18}{0.34} = 0.529$$

3.82. $$P(A_2) = 0.4 \\ P(A_1|B_1) = \frac{P(B_1|A_1)P(A_1)}{P(B_1|A_1)P(A_1) + P(B_1|A_2)P(A_2)} = \frac{(0.6)(0.4)}{(0.6)(0.6) + (0.4)(0.4)} = 0.461$$

3.83. Let A be “received the material” and let B be “adopted the book”. \(P(\overline{A}) = 0.2 \\ P(A|B) = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|\overline{A})P(\overline{A})} = \frac{(0.3)(0.8)}{(0.3)(0.8) + (0.1)(0.2)} = 0.923\)

3.84. Let P1, P2 and P3 be stocks that performed better, same or worse than last year, and let R be stocks that were rated as good by the analyst. $$P(P_1|R) = \frac{P(R|P_1)P(P_1)}{P(R|P_1)P(P_1) + P(R|P_2)P(P_2) + P(R|P_3)P(P_3)} = \frac{(0.4)(0.25)}{(0.4)(0.25) + (0.2)(0.5) + (0.1)(0.25)} = 0.44$$

3.85. $$P(B) = P(A \cap B) \cup P(\overline{A} \cap B) \\ \Rightarrow P(\overline{A} \cap B) = P(B) - P(A \cap B) \\ \Rightarrow P(\overline{A}|B) = \frac{P(\overline{A} \cap B)}{P(B)} = \frac{P(B)}{P(B)} - \frac{P(A \cap B)}{P(B)} = 1 - P(A|B)$$
Let F be the event of the process functioning correctly, and let D be the event of a defective bulb. $$P(D) = P(D|F)P(F) + P(D|\overline{F})P(\overline{F}) = (0.1)(0.9) + (0.1)(0.5) = 0.14 \\ P(F|D) = \frac{P(D|F)P(F)}{P(D)} = \frac{(0.1)(0.9)}{0.14} = 0.642$$ $$P(\overline{D}) = 1 - P(D) = 0.86 \\ P(\overline{D}|F) = 1 - P(D|F) = 0.9 \text{ (proof at beginning of question)} \\ P(F|\overline{D}) = \frac{P(\overline{D}|F)P(F)}{P(\overline{D})} = \frac{(0.9)(0.9)}{0.86} = 0.94$$

3.86. Don’t purchase corpses of dead animals. It’s not rational behaviour for a self-interested economic agent.
As for the question, let A1 be the event of a chicken coming from Free Range Farms, and let A2 be the event of a chicken coming from Big Foods. Let B be the event of a chicken weighing less than 3 pounds. $$ P(\overline{B}|A_1) = 1 - P(B|A_1) = 0.9 \text{ (according to proof at 3.85)} \\ P(A_2) = 1 - P(A_1) = 0.6 \\ P(B) = P(B|A_1)P(A_1) + P(B|A_2)P(A_2) = (0.1)(0.4) + (0.2)(0.6) = 0.16 \\ P(\overline{B}) = 1 - P(B) = 0.84 \\ P(A_1|\overline{B}) = \frac{P(\overline{B}|A_1)P(A_1)}{P(\overline{B})} = \frac{(0.9)(0.4)}{0.84} = 0.428$$
Calculating the chances of 3 chickens out of 5 matching the previous event requires the use of the binomial distribution formula, which as far as I’ve seen, has not yet been introduced in this book. $$P = \binom{5}{3}0.428^3(1 - 0.428)^2 = 0.256$$

3.87. This question is so poorly phrased that I can’t even approach it.

3.88. Mutually exclusive events are events that can’t both happen at the same time. Independence is when the occurrence of one event has no affect on the probability of another.

3.89.a. True, \(\overline{P(A) \cup P(B)} = \overline{P(A) + P(B) - P(A \cap B)} = \overline{P(A) + P(B) - (P(A) - P(A \cap \overline{B})} = \overline{P(B) + P(A \cap \overline{B})} = \overline{P(B) + P(\overline{B}) - P(\overline{A} \cap \overline{B})} = \overline{1 - P(\overline{A} \cap \overline{B})} = \overline{\overline{P(\overline{A} \cap \overline{B})}} = P(\overline{A} \cap \overline{B})\)>
3.89.b. False, because collectively exhaustive events might not be mutually exclusive, in which case their sum is greater than their union. The union of collectively exhaustive events is equal to 1, and if their sum is greater than that, it cannot equal 1.
3.89.c. True, \(\frac{n!}{x!(n-x!)} = \frac{n!}{(n-x)!(n-(n-x))!}\)
3.89.d. True, according to Bayes Theorem, \(P(A|B) = \frac{P(B|A)P(A)}{P(B)} = P(B|A)\frac{P(A)}{P(B)} = P(B|A)\)
3.89.e. True, \(P(A) = P(\overline{A}) = 1-P(A) \Rightarrow 2P(A) = 1 \Rightarrow P(A) = 0.5 \text{ and } P(\overline{A}) = 1-P(A) = 0.5\)
3.89.f. True, we’ll prove for A, \(P(A|B) = P(A)=1-P(\overline{A}) \Rightarrow P(\overline{A}) = 1-P(A|B)=P(\overline{A}|B)\)
3.89.g. False, suppose A and B are not collectively exhaustive, we’ll get a probability greater than 1, while P(S) = 1, \(P(A \cup B) = P(A) + P(B) < 1 \text{ and } P(\overline{A}) + P(\overline{B}) = 1-P(A) + 1-P(B) = 2-(P(A)+P(B)) > 1\)

3.90. Conditional probability is the ratio of occurrences of some event A, out of all the times that some other event B has occurred. Sometimes in real world data all we have is conditional probabilities and we’d like to calculate the individual probabilities, and sometimes it’s the other way around.

3.91. Given a subjective idea of the chances of some event A happening, if we know that some other event B has happened, we’d like to update out prediction as to the likelihood of event A. Bayes theorem provides us with a tool in these scenarios.

3.92.a. True, \(P(A \cup B) \geq P(A)=P(A \cap B) + P(A \cap \overline{B}) \geq P(A \cap B)\)
3.92.b. True, \(P(A \cup B) = P(A) + P(A) + P(A \cap B) \geq P(A) + P(B)\)
3.92.c. True, \(P(A) = P(A \cap B) + P(A \cap \overline{B}) \geq P(A \cap B)\)
3.92.d. True, if we can say that the intersection of an event with itself equals itself, then: \(P(A) = P(A \cap A) + P(A \cap \overline{A}) = P(A \cap A) = P(A) \Rightarrow P(A \cap \overline{A})= 0\)
3.92.e. False. For any event A, \(0 \leq P(A) \leq 1\), so two events A and B can both have a probability of 1, and their sum will be greater than 1, particularly, 2.
3.92.f. False. \(P(A \cap B) = 0 \Rightarrow P(A \cup B) = P(A) + P(B) + P(A \cap B) = P(A) + P(B)\), but \(P(A) + P(B) = 1\) only if A and B are complements, not in every case where they are mutually exclusive.
3.92.g. False. \(P(A \cup B) = P(A)+P(B)-P(A \cap B) = 1 \Rightarrow P(A \cap B) = P(A)+P(B)-1\). Therefore the intersection equals zero only if \(P(A) + P(B) = 1\), which is only the case if A and B are complements.

3.93. A joint probability is the probability that the outcome will be in the intersection between two events. Marginal probabilities are the probabilities of the distinct events. Conditional probabilities are the probabilities that some event A will happen out of all occurrences where event B does. So for example, given the following events: “participant is a vegetarian” and “participant is a vegan”, the probability that someone is a vegetarian is a marginal probability. The probability that someone who is a vegetarian is a vegan is a conditional probability and the probability that someone is both a vegetarian and a vegan is a joint probability.

3.94.a. False. See example 3.23. above. \(P(T_1|D_1)=0.1 \gt 0.18 = P(T_1)\)
3.94.b. False. Suppose event A with probability of 0.5. \(P(A\cap\overline{A})=0 \Rightarrow P(A|\overline{A})=\frac{P(A\cap\overline{A})}{P(\overline{A})}=0 \neq P(A)\)
3.94.c. \0 \leq (P(B) \leq 1\), Therefore for any \(P(B) \geq 0, P(A|B) = \frac{P(A \cap B)}{P(B)} \geq P(A \cap B)\)
3.94.d. False. \( P(A \cap B) = P(A|B)P(B) \leq P(A)P(B) \Rightarrow P(A|B) \leq P(A) \), and in example 3.23, we see a counter-example where \(P(T_1|D_1) = 0.9 \gt 0.18 = P(T_1)\)
3.94.e. In subjective probability, if an event B is assumed to have some probability P(B), we say this is a prior probability, and given knowledge that event A has happened, we update that probability to P(B|A). The assumption that the posterior probability must be at least as large as the prior probability means \(P(B|A) \geq P(B)\), which we’ve already disproved in section a of this question.

3.95. $$P(A \cup B) = P(A) + P(B) – P(A \cap B) = P(A) + P(B) – P(A|B)P(B) = P(A) + P(B)[1-P(A|B)]$$

3.96.a. $$P(A \cap B) = P(A|B)P(B) = 0.08$$
3.96.b. The events are not independent because \(P(A|B) = 0.4 \neq 0.3 = P(A)\)
3.96.c. $$P(B|A) = \frac{P(A|B)P(B)}{P(A)} = \frac{(0.4)(0.2)}{0.3} = 0.26667$$
3.96.d. $$P(\overline{A}) = P(\overline{A} \cap B) + P(\overline{A} \cap \overline{B}) \Rightarrow P(\overline{A} \cap \overline{B}) = P(\overline{A}) – P(\overline{A} \cap B)$$ $$ P(B) = P(A \cap B) + P(\overline{A} \cap B) \Rightarrow P(\overline{A} \cap B) = P(B) – P(A \cap B)$$ $$ P(\overline{A} \cap \overline{B}) = P(\overline{A}) – P(\overline{A} \cap B) = P(\overline{A}) – P(B) + P(A \cap B) = 0.7 – 0.2 + 0.08 = 0.58$$

3.97.a. Given event A that the thinner wire will arrive within a week, and event B that the thicker wire will arrive within a week, we have: $$P(A \cap B) = (0.6)P(B)$$ $$(0.6)P(B)=(0.4)P(A) \Rightarrow P(A) = 1.5P(B)$$ $$P(A \cup B) = P(A) + P(B) – P(A \cap B) = 1.5P(B) + P(B) – 0.6P(B) = 1.9P(B) \Rightarrow P(B) = 0.42$$
3.97.b. $$P(A) = \frac{P(A|B)P(B)}{P(B|A)} = \frac{(0.6)(0.42)}{0.4} = 0.63$$
3.97.c. $$P(A \cap B) = P(A|B)P(B) = (0.6)(0.42) = 0.252$$

3.98.a. Given event A “employee has an MBA” and event B “employee is over 35”: $$P(A \cap B) = P(B|A)P(A) = 0.105$$
3.98.b. $$P(A|B) = \frac{P(B|A)P(A)}{P(B)} = 0.2625$$
3.98.c. $$P(A \cup B) = P(A) + P(B) – P(A \cap B) = 0.35 + 0.4 – 0.105 = 0.645$$ 3.98.d. $$P(\overline{A} \cap B) = P(B) – P(A \cap B) = 0.4 – 0.105 = 0.295$$ $$P(\overline{A}|B) = \frac{P(\overline{A} \cap B)}{P(B)} = \frac{0.295}{0.4} = 0.7375$$ 3.98.e. No, \[P(B|A) = 0.3 \neq 0.4 = P(B)\]
3.98.f. No, \[P(A \cap B) = 0.105 \neq 0\]
3.98.g. No, \[P(A \cup B) = P(A) + P(B) – P(A \cap B) = 0.645 \neq 1\]

3.99.a. Let A be the event of someone ordering a vegetarian meal, and B will be the event of a customer being a student. $$P(A \cap B) = P(A|B)P(B) = (0.25)(0.5)=0.125$$ 3.99.b. $$P(B|A) = \frac{P(A|B)P(B)}{P(A)} = \frac{(0.25)(0.5)}{0.35} = 0.357$$ 3.99.c. $$P(\overline{A} \cap \overline{B}) = 1-P(A \cup B) = 1-(P(A) + P(B) – P(A \cap B) = 0.275$$ 3.99.d. No, \[P(A|B) = 0.25 \neq 0.35 = P(A)\]
3.99.e. No, \[P(A \cap B) = 0.125 \neq 0\]
3.99.f. No, \[P(A \cup B) = P(A) + P(B) – P(A \cap B) = 0.725 \neq 1\]

3.100.a. Let A be ‘exceeds 160 acres’ and let B be ‘owned by persons over 50 years old’. \[P(A \cap B) = P(B|A)P(A) = (0.55)(0.2) = 0.11\]
3.100.b. $$P(A \cup B) = P(A) + P(B) – P(A \cap B) = 0.2 + 0.6 – 0.11 = 0.69$$ 3.100.c. $$P(A|B) = \frac{P(B|A)P(A)}{P(B)} = \frac{(0.55)(0.2)}{0.6} = 0.18333$$ 3.100.d. No, \[ P(B|A) = 0.55 \neq 0.6 P(B)\]

3.101.a. Let H be the event of an employee only having highschool training, and let M be the event of an employee being male. \[P(A \cap M) = P(H|M)P(M) = 0.48\]
3.101.b. Let G be the event of an employee having graduate training and let W be the event of an employee being a woman. W and M are mutually exclusive and collectively exhaustive, so \[P(G) = P(G \cap M) + P(G \cap W) = P(G|M)P(M) + P(G|W)P(W) = (0.1)(0.8) + (0.15)(0.2) = 0.11\] 3.101.c. $$P(M|G) = \frac{P(G|M)P(M)}{P(G)} = \frac{(0.1)(0.8)}{0.11} = 0.727$$ 3.101.d. No, for example, \[P(G|M) = 0.1 \neq 0.11 = P(G)\] 3.101.e. $$P(W|\overline{G})=\frac{P(W \cap \overline{G})}{P(\overline{G})} = \frac{P(W) – P(W \cap G)}{P(\overline{G})} = \frac{P(W) – P(G|W)P(W)}{P(\overline{G})} = 0.191$$

3.102.a. Let F be the event of an employee favoring the plan, W be the event of an employee being a woman, and N be the event of an employee being a night-shift worker. \[P(W \cap F) = P(F|W)P(W) = (0.4)(0.3) = 0.12\]
3.102.b. $$P(N \cup W) = P(N) + P(W) – P(N \cap W) = 0.5 + 0.3 – 0.12 = 0.68$$ 3.102.c. No, \[P(W|N) = 0.2 \neq 0.3 = P(W)\]
3.102.d. $$P(N|W) = \frac{P(W|N)P(N)}{P(W)} = \frac{(0.2)(0.5)}{0.3} = 0.333$$ 3.102.e. $$P(\overline{N} \cap \overline{F}) = 1 – P(N \cup F) = 1 – (P(N) + P(F) – P(N \cap F)) = 1 – P(N) – (P(F|W)P(W) + P(F|\overline{W})P(\overline{W})) + P(F|N)P(N) = 1 – 0.5 – (0.4)(0.3) – (0.5)(0.7) + (0.65)(0.5) = 0.355$$

3.103.a. \[\binom{16}{12} = \frac{16!}{12!(16-12)!} = 1820\]
3.103.b. The number of possibilities of 8 men and 4 women is \[\binom{8}{4} = \frac{8!}{4!(8-4)!} = 70\], and the number of possibilities of 7 men and 5 women is \[\binom{8}{7}\binom{8}{5} = \frac{8!}{7!(8-7)!\frac{8!}{5!(8-5)!} = 448\], so in total there are 518 possibilities and the probability is \[\frac{518}{1820} = 0.28\]

3.104.a. \[\binom{12}{2} = \frac{12!}{2!(12-2)!} = 66\]
3.104.b. \[\frac{1}{12} + \frac{1}{12} = \frac{1}{6}\]

3.105. Let A be the event of a stock being up two years after it has been purchased, and let B be the event of Mr. Roberts receiving the first year bonus for a stock. Then we have the following: $$P(A) = 0.4 \Rightarrow P(\overline{A}) = 0.6$$ $$P(B|A) = 0.6$$ $$P(B|\overline{A}) = 0.4$$ $$P(B) = P(A \cap B) + P(\overline{A} \cap B) = P(B|A)P(A) + P(B|\overline{A})P(\overline{A}) = 0.48$$

3.106.a. Let C be the event of a patient being cured, and let T be the event of a patient receiving the treatment. $$P(C \cap T) = P(C|T)P(T) = (0.75)(0.1) = 0.075$$ 3.106.b. $$P(T|C) = \frac{P(C|T)P(T)}{P(C)} = \frac{P(C|T)P(T)}{P(C|T)P(T) + P(C|\overline{T})P(\overline{T})} = \frac{(0.75)(0.1)}{(0.75)(0.1) + (0.5)(0.9)} = 0.1428$$ 3.106.c. $$\frac{1}{\frac{100!}{10!(100-10)!}} = \frac{10!(100-10)!}{100!}$$

3.107.a. The probability of renewals (R) in January (J), is \[P(R|J) = (0.08)(0.81) + (0.41)(0.79)+ (0.06)(0.6) + (0.45)(0.21) = 0.5192\]
3.107.b. $$P(R|J) = (0.1)(0.8) + (0.57)(0.76)+ (0.24)(0.51) + (0.09)(0.14) = 0.6482$$ 3.107.c. The probability of renewal has risen, but in most categories the percentage of renewals dropped, only to be leveled a the decrease in the share of subscriptions under subscription service, which have the lowest renewal rate. So in the long term this is not necessarily good news.

3.108. Let A be the event of a passenger carrying illegal amounts of liquor, and let B be the event of a passenger identified by the system. $$P(A|B) = \frac{P(B|A)P(A)}{P(B)} = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|\overline{A})P(\overline{A})} = \frac{(0.8)(0.2)}{(0.8)(0.2) + (0.2)(0.8)} = 0.5$$ It appears that the system produces the same results as picking passengers at random.

3.109. Let A be the event of a person having contracted the disease, and let B be the event of a positive test result. $$P(B|\overline{A}) = 1 – P(\overline{B}|\overline{A}) = 0.2$$ $$P(A|B) = \frac{P(B|A)P(A)}{P(B)} = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|\overline{A})P(\overline{A})} = \frac{(0.8)(0.08)}{(0.8)(0.08) + (0.2)(0.92)} = 0.248$$

3.110. Let A be the event of a sale, and let B be the event of an existing customer. $$P(A|B) = \frac{P(B|A)P(A)}{P(B)} = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|\overline{A})P(\overline{A})} = \frac{(0.7)(0.4)}{(0.7)(0.4) + (0.5)(0.6)} = 0.482$$

3.111. Let A be the event of a final A grade, and let B be the event of an A on the midterm. $$P(A|B) = \frac{P(B|A)P(A)}{P(B)} = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|\overline{A})P(\overline{A})} = \frac{(0.7)(0.2)}{(0.7)(0.2) + (0.1)(0.8)} = 0.63636$$

3.112.a. $$P(O_w|F_w) = \frac{P(O_w \cap F_w)}{P(F_w)} = \frac{0.149}{0.29} = 0.513$$ 3.112.b. $$1 – P(O_i|F_i) = 1 – \frac{P(O_i \cap F_i)}{P(F_i)} = 1 – \frac{0.21}{0.391} = 0.462$$

3.113.a. Let G be the event of a student who will graduate, and let A be the event of an entering freshmen. $$P(A \cap G) = P(G|A)P(A) = (0.62)(0.73) = 0.4526$$ 3.113.b. $$P(G) = P(G|A)P(A) + P(G|\overline{A})P(\overline{A}) = (0.62)(0.73) + (0.78)(0.27) = 0.6632$$ 3.113.c. $$P(A \cup G) = P(A) + P(G) – P(A \cap G) = 0.73 + 0.6632 – 0.4526 = 0.9406$$ 3.113.d. No, \[P(G|\overline{A}) = 0.78 \neq 0.6632 = P(G)\]

3.114.a. Let S be the event of a store being successful, and let G be the event of a good assessment. $$P(G) = P(G|S)P(S) + P(G|\overline{S})P(\overline{S}) = (0.7)(0.6) + (0.2)(0.4) = 0.5$$ 3.114.b. $$P(S|G) = \frac{P(G|S)P(S)}{P(G)} = \frac{(0.7)(0.6)}{0.5} = 0.84$$ 3.114.c. No, \[P(G|S) = 0.7 \neq 0.5 = P(G)\]\]
3.114.d. We’ll calculate the probability of no store being successful, as \[0.4^5 = 0.01024\], therefore, the probability of at least one being successful is \[1 – 0.01024 = 0.98976\]

3.115.a. Let A be the event of a customer ordering wine, and let \[C_r, C_o, C_n\] be the events that each type of customer visits. $$P(A) = P(A|C_r)P(C_r) + P(A|C_o)P(C_o) + P(A|C_n)P(C_n) = (0.7)(0.5) + (0.5)(0.4) + (0.3)(0.1) = 0.58$$ 3.115.b. $$P(C_r|A) = \frac{P(A|C_r)P(C_r)}{P(A)} = \frac{(0.7)(0.5)}{0.58} = 0.603$$ 3.115.c. $$P(C_o|A) = \frac{P(A|C_o)P(C_o)}{P(A)} = \frac{(0.5)(0.4)}{0.58} = 0.344$$

3.116.a. Let A be the event of a purchase and let the events of customers falling into the categories be \[C_h, C_c, C_o\]. $$P(A) = P(A|C_h)P(C_h) + P(A|C_c)P(C_c) + P(A|C_o)P(C_o) = (0.2)(0.3) + (0.6)(0.5) + (0.8)(0.2) = 0.52$$ 3.116.b. $$P(C_h|A) = \frac{P(A|C_h)P(C_h)}{P(A)} = \frac{(0.2)(0.3)}{0.52} = 0.115$$

3.117. $$\frac{\binom{8}{5}}{\binom{16}{5}} = 0.012$$

3.118.a. Let C be the event of someone being guilty of a crime, and let G be the event of someone wearing gloves. $$P(C|G) = \frac{P(G|C)P(C)}{P(G)} = \frac{P(G|C)P(C)}{P(G|C)P(C) + P(G|\overline{C})P(\overline{C})} = \frac{(0.6)(0.5)}{(0.6)(0.5) + (0.8)(0.5)} = \frac{3}{7}$$ 3.118.b. A jury should not convict based on evidence that has such low accuracy.

3.119. Let the types of error events be \[E_d, E_m, E_o\] and let F be the event of a failure. $$P(E_d|F) = \frac{P(F|E_d)P(E_d)}{P(F)} = \frac{P(F|E_d)P(E_d)}{P(F|E_d)P(E_d) + P(F|E_m)P(E_m) + P(F|E_o)P(E_o)} = \frac{(0.6)(0.5)}{(0.6)(0.5) + (0.7)(0.3) + (0.3)(0.2)} = 0.526$$

3.120. Let N be the event of a new operating system introduced, and let G be the event of growth. $$P(G|N) = \frac{P(N|G)P(G)}{P(N)} = \frac{P(N|G)P(G)}{P(N|G)P(G) + P(N|\overline{G})P(\overline{G})} = \frac{(0.3)(0.7)}{(0.3)(0.7) + (0.1)(0.3)} = 0.875$$

3.121. Let D be the event of lumber having defects and let the event of lumber coming from each supplier be the events \[S_n, S_m, S_s\]. Then: $$P(S_n) = P(S_n|D)P(D) + P(S_n|\overline{D})P(\overline{D}) = (0.3)(0.2) + (0.4)(0.8) = 0.38$$ $$P(S_m) = P(S_m|D)P(D) + P(S_m|\overline{D})P(\overline{D}) = (0.5)(0.2) + (0.2)(0.8) = 0.26$$ $$P(S_s) = P(S_s|D)P(D) + P(S_s|\overline{D})P(\overline{D}) = (0.2)(0.2) + (0.4)(0.8) = 0.36$$ $$P(\overline{D}|S_n) = 1 – P(D|S_n) = 1 – \frac{P(S_n|D)P(D)}{P(S_n)} = 1 – \frac{(0.3)(0.2)}{0.38} = 0.1578$$ $$P(\overline{D}|S_m) = 1 – P(D|S_m) = 1 – \frac{P(S_m|D)P(D)}{P(S_m)} = 1 – \frac{(0.5)(0.2)}{0.26} = 0.3846$$ $$P(\overline{D}|S_s) = 1 – P(D|S_s) = 1 – \frac{P(S_s|D)P(D)}{P(S_s)} = 1 – \frac{(0.2)(0.2)}{0.36} = 0.1111$$

3.122. Let R be the event of an acre with regular plowing and let H be the event of high yields. $$P(H|R) = 1 – P(\overline{H}|R) = 0.6$$ $$P(R|H) = \frac{P(H|R)P(R)}{P(H)} = \frac{P(H|R)P(R)}{P(H|R)P(R) + P(H|\overline{R})P(\overline{R})} = \frac{(0.6)(0.4)}{(0.6)(0.4) + (0.5)(0.6)} = 0.4444$$

Solutions for Chapter 2 of Statistics for Business & Economics

The following are full solutions to chapter 2 in “Statistics for Business and Economics” By Newbold, Carlson & Thorne. I do not guarantee the correctness of any of the answers presented here. If you found a mistake, have a comment, or would like to ask me anything, I’m available by mail: me (at) shayacrich (dot) com.

The algorithm for calculating percentiles and quartiles presented in this chapter does not produce the same results as does Google Spreadsheet, and it appears that the same issue arises with MS Excel. I wrote the following Google Spreadsheet formulas for correctly calculating quartiles (assuming the dataset is in column A):

Q1: =if(MOD(count(A:A) + 1, 4)=0, indirect("A"&(count(A:A) + 1)/4), indirect("A"&rounddown((count(A:A) + 1)/4)) + 0.25*(indirect("A"&roundup((count(A:A) + 1)/4)) – indirect("A"&rounddown((count(A:A) + 1)/4))))

Q3: =if(MOD(count(A:A) + 1, 4)=0, indirect("A"&(count(A:A) + 1)*0.75), indirect("A"&rounddown((count(A:A) + 1)*0.75)) + 0.75*(indirect("A"&roundup((count(A:A) + 1)*0.75)) – indirect("A"&rounddown((count(A:A) + 1)*0.75))))

I also found an answer to someone that presented a similar issue, that suggests different (though similar) formulas: https://superuser.com/a/343368

2.1.a. The mean is 66, the median 75, and there’s no mode.

2.1.b. Because of the small outlier contained in the dataset, the median is the best choice for predicting future weekly specials. However, for examining past performances, such as gross revenue as a factor of weekly specials, the mean is still the better choice.

2.2.a.12

2.2.b. 13

2.2.c. 8

2.3.a. 3.5

2.3.b. 3.55

2.3.c. 3.7

2.4.a. 5.94

2.4.b. 6.35

2.5.a. 17.75

2.5.b. 20.74

2.6.a. Mean is 10.1, the median is 10.5, and the mode is 11

2.6.b. 6 < 7.25 < 10.5 < 12.75 < 14

2.7. The mode and median are both zero, and the mean is 0.44.

2.8.a. 25.58

2.8.b. 22.5

2.8.c. 22

2.9.a. Q1 = 2.9825, Q3 = 3.3675

2.9.b. 3.1

2.9.c. 3.39

2.10.a.8.54

2.10.b. 9

2.10.c. Comparison of mean and median suggest that this is a skewed left distribution, but this is not accurate because this isn’t a continuous unimodal dataset. A visual examination of a histogram that represents this dataset indicates that the dataset is right skewed. The positive value of skewness confirms that this is the case.

2.10.d. 2 < 6 < 9 < 10.75 < 21

2.11.a. The mean volume is 236.99, which for 100 bottles means that the volume of the entire sample is 23,699, a small fraction less than the advertised 237 mL.

2.11.b. The median volume is 237.

2.11.c. It’s difficult to tell the skewness from the same of the histogram, and different sample widths might visually indicate different results. The calculated skewness is 0.13, which confirms that the distribution is almost symmetric, although, being positive, we have to conclude that it is slightly skewed to the right.

2.11.d. 224 < 233.25 < 237 < 241 < 249

2.12.

s^2 = 5.14

s = 2.27

2.13.

s^2 = 20.3

s = 4.50

2.14. 17.57%

2.15.a. 28.77

2.15.b. 12.70

2.15.c. 44.15%

2.16.

Stem

Leaf

1

2,3,4,5,7,8,9

2

0,1,2,3,7,9

3

1,3,5,8

4

0,2,5,9

5

3

6

5

IQR = 38 – 18 = 20

2.17.a. Trick question. The variance is 25 and not the standard deviation, so the standard deviation is 5, and we need k=2 so that the interval would be [75 – 2*5, 75 + 2*5]. For k=2, Chebyshev’s theorem gives us: [1 – (1/(2^2)] * 100% = 75%.

2.17.b. Approximately 95% of observations are between 65 and 85.

2.18. The question says population, which implies the use of the empirical rule.

  1. Almost all observations, [230 – 3 * 20, 230 + 3 * 20]
  2. Approximately 95%, [230 – 2 * 20, 230 + 2 * 20]

2.19.a. The 68% that’s between [425, 475] + half the observations within [400, 500] + half the observations within [425, 525], so in total: (68 + 13.5 + 2.5)% = 84%

2.19.b. Everything within [400, 500] + half of the observations within [375, 525], so (95 + 2.5)% = 97.5%

2.19.c. Almost none.

2.20.a. Common stocks have a mean annual percentage return of 8.16%. It should be noted that the real annual growth should be calculated with a geometric mean, but this is irrelevant to this exercise. U.S. Treasury Bills have a mean annual percentage return of 5.78%. From the perspective of annual returns alone (disregarding the risk of fluctuations), common stocks are a better investment, according to past performance.

2.20.b The standard deviation for the annual percentage return on common stocks is approximately 22.30%. The standard deviation for the annual percentage return on U.S. Treasury Bills is approximately 1.47%. It appears that the standard deviation on stocks is much higher, implying higher risk of fluctuations, that could partly explain the higher returns. We’ll need to examine the coefficients of variance to determine which investment is more worthwhile. The coefficient of variance on stocks is 273.41%, whereas that of treasury bills is 25.43%. This reinforces our assumption that stocks are riskier.

2.21.a. 26.8

2.21.b. 8.48266

2.21.c. 8.48266

2.21.d. 8.48266

2.21.e. 31.65%

2.22.a. The range is 0.54, the variance is 0.010, and the standard deviation is 0.10. Higher accuracy is hard to reach with Google Spreadsheet because of floating point calculation errors. My standard deviation was 0.1017 and my variance was 0.01034, the STDEV output was 0.1024 and the VAR output was 0.01048.

2.22.b.The IQR is 0.13. Given that it’s significantly lower than the range, we conclude that the dataset has either very high or very low outliers.

2.22.c. The coefficient of variation is 2.67%.

2.23.a. The mean is 261.0545

2.23.b. The variance is 306.4373 and the standard deviation is 17.5053

2.23.c. The coefficient of variation is 6.7%.

2.24.a. The standard deviation is 1.0048

2.24.b. According to Chebyshev’s theorem we know that at least 75% lie within 2 standard deviations of the mean. However, because the standard deviation is very small relative to the mean, we can assume that it’s closer to the empirical rule’s 95%.

2.25. The mean is 52.64 and the standard deviation is 12.7147

2.26.a. 4.2

2.26.b. 4.5833

2.27.a. 101

2.27.b. The sample variance is 4195 and the sample standard deviation is 64.76.

2.28.

# of Hours

fi

mi

fi*mi

mi-mean

(mi-mean)^2

fi*(mi – mean)^2

4 < 10

8

7

56

-8.4

70.56

564.48

10 < 16

15

13

195

-0.4

0.16

2.4

16 < 22

10

19

190

3.6

12.96

129.6

22 < 28

7

25

175

9.6

92.16

645.12

  1. Approximate mean = (56 + 195 + 190 + 175) / (8 + 15 + 10 +7) = 15.4
  2. Approximate variance = (564.48 + 2.4 + 129.6 + 645.12) / (8 + 15 + 10 + 7 – 1) = 34.4, and the approximate standard deviation is 5.86.

2.29. 3.2251

2.30.a. Mean = 1.4

2.30.b. Sample variance = 23.8710 and standard deviation = 4.8857.

2.31.a. 9.36

2.31.b. 8.9063

2.32.a. 11.025

2.32.b. 0.9195

2.33. Mean = 1.654, and standard deviation = 10.6850.

2.34.a. 261.5454

2.34.b. 2735.3564

2.34.c.  The exact mean was 261.0545, and we see that the approximate mean, although very close, is not precise. The variance was 306.4373 and now it’s 2735.3564, which is indicative of the fact that the result is less precise.

2.35.a. 2.33

2.35.b. 0.9058

2.36.a. 1392.5

2.36.b. 0.9930

2.37.a. -45

2.37.b. -0.9

2.38.a. Cov = 4.2679

2.38.b. r = 0.1283

2.38.c. |r| < 2/sqrt(n), so there isn’t enough data to identify a linear correlation between the drug units and recovery times.

2.39.a. Cov = -5.5, r = -0.7760

2.39.b. |r| > 2/sqrt(n), so there is a linear correlation and it is negative, meaning that the higher level of service results in lower delivery times.

2.40.a. Cov = -20.75

2.40.b. r = -0.9366

2.41.a. 2.9072

2.41.b. 0.9617

2.42. r = 0.9300

chart (32).png

2.43. Cov = 9.9642, r = 0.9852

2.44.a. mean = 18.1325

2.44.b. s^2 = 204.7017, s = 14.3074

2.45.a. 43.1

2.45.b. s = 10.1644

2.45.c. 20 < 35 < 45 < 50 < 60

2.46.

Location 2 variation: 52.62222222

Location 2 standard deviation: 7.254117605

Location 3 variation: 75.82222222

Location 3 standard deviation: 8.707595663

Location 4: variation: 22.27777778
Location 4 standard deviation: 4.719934086

2.47. Cov = 5.18954, r = 0.24499, there is no linear correlation.

2.48.a.

chart (33).png

2.48.b. r = 0.5602

2.49. I mistakenly assumed that population #3 would be smaller than population #1. The real variances are:

Population #1: 6

Population #2: 14

Population #3: 7.14

Population #4: 54

2.50.a. [295 – 1.59*63, 295 + 1.59*63]

2.50.b. [295 – 2.5*63, 295 + 2.5*63]

2.51.a. [9.2 – 2.5*3.5, 9.2 + 2.5*3.5]

2.51.b. [9.2 – 3.5, 9.2 + 3.5]

2.52.a. [29,000 – 2*3000, 29,000 + 2*3000]

2.52.b. [29,000 – 2*3000, 29,000 + 2*3000]

2.53.a. IQR = 21.5, so if the dataset is bell-shaped, most of the employees completed the task within the same range of ~20 seconds.

2.53.b. 222 < 249.5 < 263 < 271 < 299

2.54.a. The mean is 41.6826.

2.54.b. s^2 = 284.3546, s = 16.8628.

2.54.c. 95th percentile = 70

2.54.d. Five number summary: 18 < 25.75 < 39 < 54.25 < 73

2.54.e. CV = 40.45%

2.54.f. 100*[1 – (1/k^2)]% = 90% => k = sqrt(10) = ~3.16, so we take k = 3.17, and the result is:

[41.68 – 3.17*16.86, 41.68 + 3.17*16.86]

2.55.a. Cov = 16.55

2.55.b. r = 0.8653

2.56.a. Cov = 106.9333

2.56.b. r = 0.9887

Solutions for Chapter 1 of Statistics for Business & Economics

The following are partial solutions to the first chapter in “Statistics for Business and Economics” By Newbold, Carlson & Thorne. I do not guarantee the correctness of any of the answers presented here. If you found a mistake, have a comment, or would like to ask me anything, I’m available by mail: me (at) shayacrich (dot) com.

1.1.a. Continuous numerical variable

1.1.b. Categorical variable with a nominal level of measurement

1.1.c. Categorical variable with an ordinal level of measurement

1.1.d. Discrete numerical variable

1.2.a. Categorical variable with a nominal level of measurement

1.2.b. Categorical variable with an ordinal level of measurement

1.2.c. Continuous numerical variable

1.3. Categorical with an ordinal level of measurement

1.4.a. Categorical variable with an ordinal level of measurement

1.4.b. Discrete numerical variable

1.4.c. Categorical variable with a nominal level of measurement

1.4.d. Categorical variable with a nominal level of measurement

1.5.a. Categorical variable with a nominal level of measurement

1.5.b. Discrete numerical variable

1.5.c. Categorical variable with a nominal level of measurement

1.5.d. Categorical variable with an ordinal level of measurement

1.6.a. Categorical variable with a nominal level of measurement

1.6.b. Discrete numerical variable

1.6.c. Categorical variable with a nominal level of measurement

1.6.d. Categorical variable with an ordinal level of measurement

1.7.a. Shift / Benefits?

1.7.b. Employee ID / Gender

1.7.c. Time (in seconds)

1.8.a. activity_level

1.8.b. col_grad, smoker,

1.8.c. BMI, daily_cost

1.8.d. hh_income_est, age

1.9.a

chart (1).png

1.9.b.

chart.png

1.10.

chart (2).png

1.11.a.

chart (3).png

1.11.b.

chart (4).png

1.12.

chart (6).png

1.13.

chart (7).png

1.14.a.

chart (8).png

1.14.b.

chart (9).png

1.14.c.

chart (10).png

1.15.a.

chart (12).png

1.15.b.

chart (13).png

1.15.c.

chart (14).png

1.15.d.

chart (15).png

1.16.

chart (16).png

1.17.a.

chart (17).png

1.17.b.

chart (18).png

1.18.a.

chart (19).png

1.18.b.

chart (20).png

1.19.a.

chart (21).png

1.19.b.

chart (22).png

1.19.c. (Israel)

chart (23).png

1.20.

chart (24).png

1.21.

chart (25).png

1.22.

chart (26).png

1.23.a.

Data from: http://www2.census.gov/library/publications/2010/compendia/statab/130ed/tables/11s1002.xls

Looking at the TOC here: https://www.census.gov/eos/www/naics/2017NAICS/2017_NAICS_Manual.pdf

It’s clear that durable goods are aggregated on the “33, 321, 327” line and non-durable goods are aggregated on the “31, 32 (except 321 and 327)” line.

chart (27).png

1.23.b.

chart (28).png

Skipped a few exercises…

1.30.a. 6 (5-7)

1.30.b. 8 (7-8)

1.30.c. 8 (8-10)

1.30.d. 10 (8-10)

1.30.e. 11 (10-11)

1.31. The sample size is 110 observations, so for all subsections, we choose the number of classes to be k=8.

  1. upper((85-20)/8) = 9
  2. upper((190-30)/8) = 20
  3. upper((230-40)/8) = 24
  4. upper((500-140)/8) = 45

1.32. Sample size is 28 observations, so we choose the number of classes to be k=6.

w=upper((65-12)/6)=9

a.

Class

Frequency

10-19

5

20-29

3

30-39

7

40-49

4

50-59

5

60-69

4

b.

chart (29).png

c.

chart (30).png

d.

Stem

Leaf

1

2,3,5,7

2

1,4,8

3

2,5,6,7,9

4

0,1,4

5

1,4,6,9

6

2,4,5

1.33

Stem

Leaf

1

0

2

3, 4, 6, 8, 9

3

0, 5, 6, 9

4

4, 5, 8

5

0, 2, 5

6

2, 7

1.34.a.

Class

Relative Frequency

0 < 10

16.3%

10 < 20

20.4%

20 < 30

26.5%

30 < 40

24.4%

40 < 50

12.2%

1.34.b.

Class

Relative Frequency

0 < 10

8

10 < 20

18

20 < 30

31

30 < 40

43

40 < 50

49

1.34.c.

Class

Relative Frequency

0 < 10

16.3%

10 < 20

36.7%

20 < 30

63.3%

30 < 40

87.8%

40 < 50

100%

1.35.

chart (31).png

Skipping 1.36 – 1.74 (the rest of the chapter).

Zapier Usage Research

In July of 2013 I did some business related research regarding the type of usage made in Zapier. The research involved scraping the list of zaps that’s publicly available on the Zapier website, dumping them into a database, and fetching a few queries with different intersections on the dataset. The business I was planning at the time is now acient history, so I thought I’d share my results with the world, in case anyone might find it useful.
*This article is part of a business plan I decided to edit into a series of blog posts. You can find the rest of the content here

Patents Are Going To Change

Within about two months time from the publication of this post, the patent for the portable computer is going to expire. Considering the proliferation of laptops, and the amount of patents involved with the manufacture of one, it’s hard to predict whether this change is going to affect the industry to any extent. However, this does illuminate clearly the point in time we’re in within these early stages of the age of information. It’s well known that the patentability of software was established within the U.S during the early 90’s, and that they’ve been growing in numbers ever since. This is important for us, because these days the first few batches of computer related patents are being released for free use. It is the unique character of this industry that makes this advancement un-newsworthy, because of a perceived pace of progress under which nothing of 20 years of age could be of any worth. Anyone who’s been around that long, though, knows this to be untrue. Innovation works much the same as a pendulum, moving from one extreme to the other. Such is the way Anil Dash describes the move from mainframes to the personal computer as a revolution of sorts, only to be followed by a swing of the pendulum in the opposite direction with the proliferation of the cloud. The same could be claimed with regard to IDEs, visual programming languages, imperative programming paradigms, and many other aspects of technology that we consider as innovative. Today’s world of technology is without a doubt on the move forward, but much of the conceptual foundations of today’s inventions were already laid out in legal speak in a previous turn of the wheel. This has the dual effect of allowing us to move forward in fronts that were blocked from extensive (as opposed to intensive) progress up to now, and also, limiting the possibility of a future appropriation of these innovations.

The Opportunity

When speaking of an extensive progress that weren’t possible up to now, one can imagine a world of smaller and smaller manufacturers, much like the way every town had it’s own bread makers and shoe makers at some point in time. This might seem counter-intuitive with regard to a common rationale of centralized manufacturing that can cuts costs, but this isn’t really the case in most industries. Nike has centralized manufacturing, but their products cost a fortune, because they have vast expenses for the purpose of brand building, and because there’s enough of a demand to make it worthwhile. The telecommunications equipment industry is another kind of example. A few manufacturers like Cisco have vast expenses in R&D and they aim for the higher end of the market. However, most of the consumers are service providers in developing nations, where infrastructure has yet to be deployed. In these markets there seems to be a clear preference for smaller manufacturers that provide cheaper goods. Another result of the same process could be better kinds of free software alternatives to existing products, and in particular a preference for maker alternatives or pretail variations that could replace the mass-manufactured goods of today’s world. The rise of 3D printing and maker culture saw an instantaneous reaction from IP aficionados that claimed for the need to implement IP protection mechanisms within printers, even for non-commercial use. But the expiration of patents for much of the technologies that are simple enough to be of relevance to makers would change the balance of power in these regards.

The Future Of Patents

Today’s innovative sphere is advanced in two parallel trajectories. On the one hand are entrepreneurs, hackers and makers, and on the other, a narrow oligopoly of tech giants. In a recent talk with Israel Twito, CEO of New-tone Patent Search ltd., it was noted that the same balance of power between big manufacturers that prevents smaller players from being able to compete, has the double-sided effect of forcing these same companies to continue the arms race forward, with intensive R&D, as well as through the constant acquisition of smaller players. When passing criticism over an exits-focused startup culture, one has to keep this state of affairs in mind. Some industries are blocked from entry to anyone that isn’t deep-pocketed enough to join into an arms race that’s already under way. On the other side of the market are big players that need sources of producing new IP, with little or no regard to the cost – both the financial one, and the one that has to do with the damage to innovation and to clients of acquired ventures. This saddening state has an upside as well, one that will only begin to show within the next few years. The tech giants are arming themselves with patents for their own ends, but are also making the patent system obsolete by way of doing so. With the realization that a patent’s strength lies in the abstractedness and generalization of its phrasing, and with the constant swings of the technological pendulum, we’ll see more and more areas where the field for innovation is ironically already covered by existing, expired patents. With that process underway, the tech giants of today will be forced to create products of such sophistication that their target audiences will narrow, much like Cisco’s does today.

An End Note

The grinding of the patent system by economically motivated and heavily funded players requires new and better tools to help us understand what opportunities lie before us. A survey of U.S patents that are about to expire within the next year shows interesting things like cloaking systems, software protection machanisms, electric car chargers, pen-based computer inputs, home networking, and a lot of the technologies around cellular communications. Crossing that information with patent lawsuit databases shows that not a lot of today’s expiring patents are inhibitors to innovation (or competition), but that’s going to change at some point in time, and we best be prepared.

My Cofounder Video Marketing Endeavors

I prepared a short wanted ad for my next venture with the kind help of my beloved partner. It got me to the first page of Youtube’s search results for the keyword “find a cofounder” with just friends and family marketing. It didn’t get me a cofounder. This indicates some relatively obvious conclusions: (1) There’s ad arbitrage on Youtube, and (2) this doesn’t matter if your target audience isn’t looking for you over there. Either way, here it is:

PinMyScreen – Bookmarks Using Pinterest

PinMyScreen is a Social bookmarking extension for Google Chrome. The extension adds a Pinterest pin button to the browser’s toolbar, and allows the user to pin website screenshots onto his Pinterest boards. This works the same way as with gimme bar only it doesn’t impose a new system on clients. Instead, PinMyScreen allows you to use your existing Pinterest acocunts to save your favorite websites.

Known Issues

PinMyScreen uses Chrome’s built-in screenshot functionality (the same one that’s used for the ‘most visited’ page you see when opening a new tab. This, for security reasons, prevents us from screenshoting any sensitive data that might exist within secured pages or Chrome web applications. As a result, not every website can be shot, and you’ll sometimes find that the extension sends an empty image onto the Pinterest ‘pin this’ form. The process of screenshoting a webpage requires a roundtrip request and response with our server. You should note that we do not keep any images or any other sort of information on those servers or anywhere else. As opposed to many other Chrome extensions, I’ve chosen not to use anonymous analytics as well, so no data on your actions is logged at any stage. However, communication with our servers sometimes takes a bit longer than expected. You should note, that if you clicked the ‘pin this’ button on your browser’s toolbar, the request was sent and the pinterest form will appear in a few seconds time. This however doesn’t always happen fast enough, and there isn’t an indication that the request is in a processing state. If I’ll hear a lot of complaints from people about speed, I’ll transfer the servers onto Amazon’s cloud services, so if you think that there’s place for improvement, just let me know.

Bug Reporting and Support

I’m always available to hearing feedback over the extension, be it a positive or negative one, so don’t hesitate. If there’s anything you need help with, just send me an email to me at shayacrich dot com.   Visit our page on the Chrome WebStore to add the extension to your browsers.

Developer & Vendor Tools for Data Integration

Imagine a scenario where you launch a new project management tool, and it’s really the best one out there, but no one wants to use it. This could happen for several reasons, one of which would be the lack of integrations with other types of software your potential users are using. They might need integration with Google Calendar, or with a variety of CRMs, or ticketing systems, or bug tracking systems, or version control services, and so on and so forth. Eventually, you’re forced to start providing built-in integrations with any and every other tool out there. With each new integration offered, the potential user base of your app grows bigger, which is good, but this scenario has it’s disadvantages:

  • There are infinite potential integrations, so the work is unending.
  • This diverts the attention of your development staff from what they do well (project management, in the case described above) to other types of work.
  • Bugs are usually hard to control because you’re working against systems that you don’t host, and whose code you can’t see.

The need for integration tools aimed at software vendors derives from the scenario described just now, which I like to call the integration pitfall. Application providers are faced with pressure, having to offer built-in integrations with a growing subset of related products, so that their own product would be useful to new target audiences. Some tools were developed with the aim of assisting these software providers, instead of targeting their end users directly.

Elastic.io is one such tool. It lets you connect with a variety of third party services and map fields from these data sources to your own application with a simple drag and drop interface. In sum, Elastic.io is meant for non-technical users that want to extend the application that they offer with built-in integrations. A very different approach is that of Nectil, a French service provider that developed their own software for web application integrations, and that offer small businesses tailored solutions on top of that framework.

Most other tools aim at the developer crowd, trying to help them consume APIs with greater ease and speed. Temboo provides SDKs for server side API consumption. It supports a very extensive amount of APIs and provides access to them through a unified interface. The service is SaaS based, though cheap, and meant for low levels of consumption (anything really big would require custom pricing). Webshell is an API that lets you integrate different APIs from third party services and create new functionality. It’s a JS library, which handles authentication for you and lets you construct your custom APIs using an editor. DERI Pipes is a software that lets you create automated processes that transform and mash together web content (much like Yahoo! Pipes, but more technically complex).

Cumula is a PHP application framework that takes the idea even further. Based on the assumption that modern web applications no longer sit on top of a single on-premise database, but instead connect with a multiplicity of services, they’ve contrived a framework that lets developers build applications with a unified interface for local and remote data sources. It’s modular, with the use of components for describing parts of the application. Each component has its own routing and templating. A collection of DataStore and DataService packages are available as dependencies, included in the same Github repo. As of now, Github’s popularity metrics indicate very low acceptance rates, and the same goes for the associated Google Group. Nevertheless, this is a relatively new project, and they’ve already received some media coverage.

Other platforms aim to help developers find and consume APIs, without altering the way they work or requiring the utilization of a specific framework. APIHub (renamed to the anypoint portal after being acquired by Mulesoft) and the ProgrammableWeb projects are API explorers, within which one can quickly search and learn about new APIs. Mashape does the same, although focused more on developer APIs and less on connecting with third party applications. Mashape acts as a middleman between providers and consumers, relieving API consumers from having to maintain separate authentication processes with a multiplicity of remote services, and thus enabling the proliferation of usage of such services.

Edit: Since the time of writing this, a new contestant has emerged, called CloudElements, that provides a uniform API for accessing multiple cloud based applications.


*This article is part of a business plan I decided to edit into a series of blog posts. You can find the rest of the content here

Guidelines for hiring someone to build your startup

1. You should have a very clear understanding of what you need built. Usually, mocking up the wireframes and writing a short spec that specifies the processes users go through in the system is a crucial first step. If you can’t do this yourself, the developer or team you work with will have to be experienced with doing this for entrepreneurs, but I definitely do recommend you try to do this yourself. 2. The developer or firm you work with should issue a detailed estimation and timeline, so that you’ll have a clear view of what’s going to happen. This will allow you to change course during development if you feel the work isn’t progressing in the right pace or if you need to change the requirements once you start to see your vision realized in code. 3. You should make sure that you’ll have the ability to track the work on a weekly basis, or sit with the developer in the same room. This means you know at any given point in time how many hours were already spent, and how many features are done. This will allow you to control the level of fine tuning of the implementation of the design and UX, as well as skip features when you fear the system might not be ready before your money runs out. 4. If you’re burning your own money, you should have a very very clear vision of how you intend to progress once the MVP is ready. Remember that an MVP is not a finished product, and there’s a limit to how much you can monetize a product at that level of maturity.

The Super Distribution of Data

A recent article on ALA pronounced the proliferation of content aggregators such as RSS readers or mobile apps like Flipboard, as ‘the future super-distribution of content’. Readers want control over formatting, frequency of consumption and methods of sharing, and they’re turning away from the content providers and using specialized tools that fit the uniform experience that they need.

For most of us, this is already a given truth, and looks like a rather obvious progression from previous technological advancements. However, at the same time, we also take for granted that the web applications we use for our day-to-day trade are contained silos, with their own special UIs and terminologies. Each of these contained silos holds a segment of our data, and forces us to sign in so we can view it, and to adapt to their idea of how to present our data to us.

Gradually, businesses are adopting data aggregators that simplify specific segments of their day to day work. For instance, a business that relies on PPC-based marketing, might start using a PPC management tool, from which to manage their entire ad spend from a single user interface. Now, they no longer have to sign in to multiple services, and cope with the varied complexities of each system.

With the adoption of such tools, comes a new expectation – that other aspects of their business be managed with the same simplicity. Each aspect of a business’s operation deserves a specific tool that would abstract away the complexities of the underlying systems. But then arises a new multiplicity of tools, which suffer from the same sorts of problems – multiple sign-ins, multiple UIs, and data that’s contained within specific apps, not usable anywhere else.

I propose a bigger revolution – one of a super-distribution of data at large. The principle logic behind this change is the recognition that building a truly one-stop-shop for any and every need of a business is unachievable. Instead, one should focus on enabling the distributed model, in which multiple complementing services are used in parallel. With this recognition in mind, I wouldn’t suggest to anyone to try to replace the PPC management tool mentioned above, but to complement it by enabling it to communicate with other tools meant for other specific aspects of a business’s operation.


* This article is part of a business plan I decided to edit into a series of blog posts. You can find the rest of the content here