Effect sizes were calculated using Johnson (1989), and were calculated from t-scores and sample sizes, a procedure equivalent to the formula presented above. Table 7 presents cloze and reading comprehension test effect sizes for the individual studies. All effect sizes are positive, confirming that extensive readers outperformed comparison students in all comparisons. The mean effect size for the entire sample was 0.749; this was recalculated, taking sample sizes in account, resulting in an adjusted mean effect size of d = 0.730 (see Wolf, 1986; p. 41 for procedures). This means that the average student in the extensive reading treatment scored 0.73 standard deviations above the average student in the comparison group.
Similar calculations were carried out for only those studies using the cloze test as a dependent variable, which meant that each group was counted only once, and only gain scores where utilized in effect size calculation. The mean effect size was 0.831, and the adjusted mean effect size was d = 0.813.
Inspection of the effect sizes in Table 7, however, reveals clear variability. This was confirmed by application of the Test of Homogeneity, described in Wolf (1986). For both the entire sample as well as just those comparisons utilizing cloze tests, there was significant heterogeneity (full sample: chi square = 17.10, df = 6, p < 0.001; reduced sample: chi square = 15.35, df = 4, p < 0.01). The causes of this heterogeneity are not fully clear: one could hypothesize that the large effect size in the second study was due to the longer treatment; this explains why the effect sizes in the second study are larger than the first, but does not explain why the effect sizes in the third study are not larger, especially in view of the fact that a wider selection was available to the students. It is, however, interesting that the comparison group in Experiment 3 made better gains than did the comparison group in Experiment 2, suggesting that they did more reading or that the cloze exercises contributed some comprehensible input.
Despite this variation, the results are remarkably consistent. The reliability of the advantage of the extensive readers, as well as the results of previous studies, also helps reduce the potential harm caused by the fact that intact classes were used; it was not possible to randomize subjects. Repeated experimentation with different classes, however, provides quasi-randomization.