Many functions require the calculation of the sum of squared deviations about a mean. To do this accurately, Microsoft Office Excel 2003 and later versions of Excel use a two-pass procedure that finds the mean on the first pass, and then calculates squared deviations about the mean on the second pass.
In precise arithmetic, the same result occurs in earlier versions of Excel that use the "calculator formula." This formula is so named because it was in widespread use when statisticians used calculators instead of computers. With the calculator formula, earlier versions of Excel sum the squares of the observations, and then subtract from this total the following quantity:
((sum of observations)^2) / number of observations
This calculation occurs in a single pass through the data.
In finite precision arithmetic, the calculator formula is subject to roundoff errors in extreme cases. Excel 2002 and earlier versions of Excel use the calculator formula for most functions that require a sum of squared deviations about a mean (such as VAR, STDEV, SLOPE, and PEARSON). However, these versions of Excel also use the more numerically robust two-pass procedure for the CORREL, COVAR, and DEVSQ functions.
Experts in statistical computing recommend that you do not use the calculator formula. The calculator formula is presented as "how not to do it" in texts about statistical computing. Unfortunately, all three of the Analysis ToolPak (ATP) ANOVA tools make widespread use of the calculator formula or an equivalent single-pass approach in Excel 2002 and in earlier versions of Excel.
Excel 2003 and later versions of Excel use the two-pass procedure for all three ATP ANOVA models. This article discusses the following computational improvements in ATP's three ANOVA models:
• Single Factor
• Two-Factor with Replication
• Two-Factor without Replication
Single Factor
A simple example with data is as follows.
ANOVA 1 BASIC MODEL:
1 2 3
2 4 4
3 6 5
4 8 6
5 --7
6 --8
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Column 1 6 ---21 --- 3.5 ----- 3.5
Column 2 4 --- 20 --- 5 ------- 6.666667
Column 3 6 --- 33 --- 5.5 ----- 3.5
ANOVA
Source of Variation SS df MS ----F --------P-value F crit
Between Groups 12.75 2 6.375 1.506818 0.257897 3.805567
Within Groups 55 13 4.230769
Total 67.75 15
Excel 2002 and earlier versions of Excel use the following pseudocode to calculate the sums of squares:
GrandSum = 0;
GrandSumOfSqs = 0;
GrandSampleMeanSqrd = 0;
GrandMeanSqrd = 0;
GrandSampleSize = 0;
For s = 1 to Number_of_Samples do
GrandSum = GrandSum + sum of observations in s-th sample;
GrandSumOfSqs = GrandSumOfSqs + sum of squared observations in s-th sample;
GrandSampleMeanSqrd = GrandSampleMeanSqrd +
(sum of observations in s-th sample^2)/size of s-th sample;
GrandSampleSize = GrandSampleSize + size of s-th sample
Endfor;
GrandMeanSqrd = (GrandSum^2) / GrandSampleSize;
TotalSS = GrandSumOfSqs – GrandMeanSqrd;
BetweenGroupsSS = GrandSampleMeanSqrd – GrandMeanSqrd;
WithinGroupsSS = GrandSumOfSqs – GrandSampleMeanSqrd;
This approach is essentially the calculator formula. This approach computes the sums of squares of observations, and then subtracts a quantity from them, just as VAR computes the sum of squares of the observations, and then subtracts sum of observations^2/sample size. Similar pseudocode for the model 2 and model 3 has been omitted.
Again, for model 2 and model 3, sums of squares are calculated and a quantity is subtracted from the sum of squares as in the calculator formula. Unfortunately, basic statistics texts frequently suggest approaches for ANOVA such as the one that is shown earlier in this article.
Excel 2003 and later versions of Excel use a different approach to calculate the various entries in the SS column of the ANOVA table. For illustration, this article assumes that the numeric data in the earlier example appear in cells A2:C7 with missing data in cells B6 and B7.
• Total SS is just DEVSQ applied to all the data, such as DEVSQ(A2:C7). DEVSQ works correctly even though data is missing.
• Between Groups SS is Total SS minus the sum of DEVSQ applied to each column, such as DEVSQ(A2:A7) + DEVSQ(B2:B7) + DEVSQ(C2:C7).
• Within Groups SS is Total SS minus Between Groups SS.
In precise arithmetic, the same result occurs in earlier versions of Excel that use the "calculator formula." This formula is so named because it was in widespread use when statisticians used calculators instead of computers. With the calculator formula, earlier versions of Excel sum the squares of the observations, and then subtract from this total the following quantity:
((sum of observations)^2) / number of observations
This calculation occurs in a single pass through the data.
In finite precision arithmetic, the calculator formula is subject to roundoff errors in extreme cases. Excel 2002 and earlier versions of Excel use the calculator formula for most functions that require a sum of squared deviations about a mean (such as VAR, STDEV, SLOPE, and PEARSON). However, these versions of Excel also use the more numerically robust two-pass procedure for the CORREL, COVAR, and DEVSQ functions.
Experts in statistical computing recommend that you do not use the calculator formula. The calculator formula is presented as "how not to do it" in texts about statistical computing. Unfortunately, all three of the Analysis ToolPak (ATP) ANOVA tools make widespread use of the calculator formula or an equivalent single-pass approach in Excel 2002 and in earlier versions of Excel.
Excel 2003 and later versions of Excel use the two-pass procedure for all three ATP ANOVA models. This article discusses the following computational improvements in ATP's three ANOVA models:
• Single Factor
• Two-Factor with Replication
• Two-Factor without Replication
Single Factor
A simple example with data is as follows.
ANOVA 1 BASIC MODEL:
1 2 3
2 4 4
3 6 5
4 8 6
5 --7
6 --8
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Column 1 6 ---21 --- 3.5 ----- 3.5
Column 2 4 --- 20 --- 5 ------- 6.666667
Column 3 6 --- 33 --- 5.5 ----- 3.5
ANOVA
Source of Variation SS df MS ----F --------P-value F crit
Between Groups 12.75 2 6.375 1.506818 0.257897 3.805567
Within Groups 55 13 4.230769
Total 67.75 15
Excel 2002 and earlier versions of Excel use the following pseudocode to calculate the sums of squares:
GrandSum = 0;
GrandSumOfSqs = 0;
GrandSampleMeanSqrd = 0;
GrandMeanSqrd = 0;
GrandSampleSize = 0;
For s = 1 to Number_of_Samples do
GrandSum = GrandSum + sum of observations in s-th sample;
GrandSumOfSqs = GrandSumOfSqs + sum of squared observations in s-th sample;
GrandSampleMeanSqrd = GrandSampleMeanSqrd +
(sum of observations in s-th sample^2)/size of s-th sample;
GrandSampleSize = GrandSampleSize + size of s-th sample
Endfor;
GrandMeanSqrd = (GrandSum^2) / GrandSampleSize;
TotalSS = GrandSumOfSqs – GrandMeanSqrd;
BetweenGroupsSS = GrandSampleMeanSqrd – GrandMeanSqrd;
WithinGroupsSS = GrandSumOfSqs – GrandSampleMeanSqrd;
This approach is essentially the calculator formula. This approach computes the sums of squares of observations, and then subtracts a quantity from them, just as VAR computes the sum of squares of the observations, and then subtracts sum of observations^2/sample size. Similar pseudocode for the model 2 and model 3 has been omitted.
Again, for model 2 and model 3, sums of squares are calculated and a quantity is subtracted from the sum of squares as in the calculator formula. Unfortunately, basic statistics texts frequently suggest approaches for ANOVA such as the one that is shown earlier in this article.
Excel 2003 and later versions of Excel use a different approach to calculate the various entries in the SS column of the ANOVA table. For illustration, this article assumes that the numeric data in the earlier example appear in cells A2:C7 with missing data in cells B6 and B7.
• Total SS is just DEVSQ applied to all the data, such as DEVSQ(A2:C7). DEVSQ works correctly even though data is missing.
• Between Groups SS is Total SS minus the sum of DEVSQ applied to each column, such as DEVSQ(A2:A7) + DEVSQ(B2:B7) + DEVSQ(C2:C7).
• Within Groups SS is Total SS minus Between Groups SS.