Want to take part in these discussions? Sign in if you have an account, or apply for one below
Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.
Roko Mijic said this on Google+
Now, I wonder how big the variance in death figures over a 13 week period is? Since I don’t have the time to chase up data on deaths over a 13 week period for the past 10 years, I don’t know. But there is a way to get a lower bound on it. Conveniently, they provide a table of deaths in the individual weeks 12-25. For all the weeks in 2011 and 2010 combined, the standard deviation of those weekly figures is 556 deaths. Multiplying by the square root of 13, we get an estimate of the standard deviation of the sum of 13 weeks. And the answer is: 2006 deaths.
I just read this on the blog and I’m replying here.
This is not a good way to estimate the standard deviation of the weekly figures. The paper is about differences between various 14 week periods. If you combine the data for two of them, including the allegedly unusual one, your estimate for the standard deviation will likely too high, since it will be increased by differences between periods. Even if you use the periods separately, and ignore the period after Mar 2011, there may be seasonal effects within the periods which boost the estimated standard deviation. (There certainly are such effects around Xmas.)
Like Roko Mijic, I can’t be bothered to download more data. It is not difficult to do, but it is too like my day job. Using the data from the paper, I think the best way to estimate the standard deviation, allowing for seasonal effects, is to take the two ’before’ periods (Dec 2009 - Mar 2010 and Dec 2010 - Mar 2011), subtract corresponding weeks, and estimate the standard deviation of the differences, and then divide by sqrt(2). I get 332 for the sd of the weekly figures.
Next, I simulated data from a normal with mean 11000 and sd 332, and found out how often the method in the paper produced a result which is more unusual than the value of the statistic they calculated in Appendix Table 4. This happens about 20% of the time for a two-sided result or 10% for a one-sided result.
Here is my R code for you to check.
x2010spring <- c(11010, 11097, 11075, 10712, 10940, 10549, 10637, 10389,
10491, 10352, 9894, 10781, 10178, 10290)
x2011spring <- c(12137, 11739, 12052, 10928, 10743, 10826, 11251, 11300,
11132, 10839, 9538, 10770, 10981, 10779)
x2010winter <- c(10323, 7942, 8288, 11557, 11299, 10110, 10832,
10524, 9877, 9802, 10198, 10586, 10699, 9969)
x2011winter <- c(10702, 8339, 8194, 11804, 10775, 10689, 10420,
10295, 10700, 10952, 10762, 10779, 10639, 10274)
par(mfrow=c(2,1))
maxd <- max(x2010spring, x2011spring, x2010winter, x2011winter)
mind <- min(x2010spring, x2011spring, x2010winter, x2011winter)
plot(c(x2010winter,x2010spring), ylim=c(mind, maxd))
plot(c(x2011winter,x2011spring), ylim=c(mind, maxd))
esd <- sd(x2010winter-x2011winter)/sqrt(2)
n <- 0
for (i in 1:10000) {
w0 <- sum(rnorm(14, mean=11000, sd=esd))
w1 <- sum(rnorm(14, mean=11000, sd=esd))
s0 <- sum(rnorm(14, mean=11000, sd=esd))
s1 <- sum(rnorm(14, mean=11000, sd=esd))
O <- s1/s0
E <- w1/w0
mean1 <- sqrt(s1)^-1 * O
mean2 <- sqrt(s0)^-1 * E
X <- (O-E)/sqrt(mean1^2+mean2^2)
if (X > 5.7) { n <- n+1 }
cat (X, "\n")
}
n
Hi! Thanks a million, Graham! I’ll copy this information over to the blog as soon as the strike ends.
Maybe you saw the Scientific American blog article pointing out other flaws in the Mangano-Sherman paper. It’s a pretty lame paper.
1 to 2 of 2