Type I error rates and test power for some variance components estimation methods : one-way random effect model

This study was conducted to compare Type I error and test power of ANOVA, REML and ML methods by Monte Carlo simulation technique under different experimental conditions. Simulation results indicated that the variance ratios, sample size and number of groups were important factors in determining appropriate methods which were used to estimate variance components. The ML method was found slightly superior when compared to ANOVA and REML methods. On the other hand, ANOVA and REML methods generated similar results in general. As a results, regardless of distribution shapes and number of groups and if n<15; ML, REML methods might be preferred to the ANOVA. However, when either number of groups or sample size was increased (n≥15) ANOVA method may also be used along with ML and REML.


Introduction
Estimation of the amount of variation between treatments in total variation is an important issue for animal breeding as well as other fields of science (Falconer 1989).Variance components estimation methods such as Analysis of Variance (ANOVA), Maximum Likelihood (ML), Restricted Maximum Likelihood (REML), Bayesian, Henderson I, II, III are the ways to evaluate the amount of variation in response variable which is related to one or more random-effects factors (Saahai & Ojeda 2004, 2005).ANOVA method has been exploited by many years in estimating variance components (Searle et al. 1992, Yolcu et al. 2004).However, even if the assumptions of ANOVA (normality and homogeneity of variance) are satisfied, it tends to predict negative variance component when mean square treatment is smaller than mean square error (MSA<MSE) (Verdooren 1982, Smith & Sawage 1992, Kelly & Mathew 1993, 1994, Rao 2001, Searle et al. 2006).Searle (1971) reported that when MSA<MSE, there is no mechanism used in ANOVA that prevents a negative estimate.It is possible, however, to avoid getting negative estimates by using some other methods such as ML, REML, and Bayesian method instead of ANOVA.Searle et al. (2006) reported that there is nothing in the ANOVA method that will prevent a negative estimate.Another alternative would be to use a method of estimation that explicitly excludes the possibility of negative estimates such as ML, REML and Bayesian method.
There are many studies about comparing variance component methods with respect to their performance on both a real data set and an artificial data set generated by simulation techniques (Swallow & Monaham 1984, Rao & Kleffe 1988, Westfall 1987, Chaloner 1987, Khattree & Gill 1988, Smith & Sawage 1992, Westfall & Bremer 1994, Kelly & Mathew 1994, Belzile & Angers 1995, Rao 2001, Shugart et al. 2002).However, these methods were compared to each other only based on one data set.It does not provide efficient comparison of the performances of these methods based on one data set.Comparison of the performance of these methods with respect to Type I error rate and test power under different experimental conditions made by Monte Carlo simulation studies, might provide more detailed information.
The main purpose of this study was to compare ANOVA, ML and REML methods with respect to Type I error rates and test power under different experimental conditions using Monte Carlo simulation studies.

Material and methods
Monte Carlo simulation techniques were used to compare ANOVA, ML and REML methods with respect to Type I error rates and test power under different experimental conditions.Type I error rates and test power of these methods were assessed under different situations.The following variables were manipulated in this simulation study: a) type of population b) number of groups c) sample sizes d) variance homogeneity and heterogeneity e) effect sizes.For k=3 and k=5, random numbers from the N (0,1), χ 2 (3), t (10) and β (5,2) distributions were generated by a Microsoft Fortran Developer Studio (Microsoft Corporation, Redmond, WA, USA) using generators from IMSL library (functions RNNOA, RNSTT, RNCHI and RNBET) (Anonymous 1994).The variance ratios were chosen as 1:1:1, 1:1:4, 1:1:9 and 1:1:25 for k=3 and 1:1:1:1:1, 1:1:1:1:4, 1:1:1:1:9 and 1:1:1:1:25 for k=5.In order to form heterogeneity among the population variances, random numbers in the samples were multiplied by specific constant numbers (σ=1, 2, 3, 5).The populations were standardised because they have different means and variances.Shape of distributions was not changed while the means were changed to 0 and the standard deviations were changed to 1.For k=3 and k=5, 30 000 data sets were generated from mentioned populations employing equal sample sizes of 5, 10, 15 and 20.
For each experimental condition, the ANOVA, ML and REML methods were performed to estimate variance components (σ 2 e and σ 2 α ), respectively.Then, the numbers of predicted σ 2 α s whose values fall outside of the confidence interval was calculated as follows and counted afterwards.And then, dividing the value of σ 2 α whose values fall outside of the confidence interval by the total number of 30 000 trials, type I error rates were found.Since there is a solution only for σ 2 α >0 in terms of ML and REML methods, the negative predicted values of σ 2 α s were assumed to be zero (Sahai & Ojeda 2005): In order to compare the methods with regard to test power, firstly, specific constant numbers with standard deviation form (effect sizes of 0.50, 0.75 and 1.25) were added to the random numbers of the last population.Thus, the predicted values of σ 2 α s were provided to fall outside of the confidence interval in question.Then, the number of σ 2 α s falling outside of the intervals which were originally falling outside was determined.Finally, having divided that number by 30 000 which is the total number of trial, the power of the test was obtained.

Statistical methods
If there are k treatment groups and each group has the same number of observations (balanced design), the general form of the random-effects one-way model can be written as below: where y ij is the observed value of experimental unit j in group i, μ is the general population mean, α i is the effect of group i, e ij is the random error terms with (0, σ 2 e ).The e ij 's and α i 's are mutually exclusive and α i ~ N(0, σ 2 α ), e ij ~ N (0, σ 2 e ) and Cov (α i , e ij )=0 (Rasch et al. 1999, Kaps & Lamberson 2004, Rasch & Masata 2006).
In random-effects model, the expected value and variance of y ij is μ and σ 2 α +σ 2 e ), respectively.Since α i and e ij are mutually exclusive, therefore the distribution of The two components σ 2 α and σ 2 e of y ij are called variance components.Different methods such as ANOVA, ML, REML, Bayesian, Henderson I, II, III, and MINQUE etc. have been developed in order to estimate the components σ 2 α and σ 2 e .ANOVA, ML and REML methods were taken into consideration in this study.
Under normality assumption, the probability of negative estimate for σ2 α can be computed for various numbers of treatments (k) and sample sizes (n) as:

Maximum likelihood method (ML)
In order to avoid negative estimates, instead of ANOVA, one of the alternative methods which is used to estimate variance components, is maximum likelihood (ML).ML estimators of σ 2 e and σ 2 α are: where k is the number of group, n is the number of observation, SST is the total sum of squares, MSE is the mean square error and MSA is the mean square treatment.Since ML estimators must satisfy σ 2 e and σ 2 α ≥0, If MSA<MSE then σ2 α = 0 and σ2 e = SST (Hartley & Rao 1967, Harville 1977, Kaps & Lamberson 2004).

REML method
In order to avoid negative estimates another alternative method is REML.Under the normality assumption, for balanced one-way random model the REML estimators of σ 2 e and σ 2 α are:

Results
Type I error rates: three groups Empirical Type I error results of 30.000 simulation runs are given in Table 1-4.When k=3, the distribution was normal (0, 1) and variances were equal (1:1:1), REML and ML methods had similar and acceptable Type I error rates (around 5.0 %).Type I error rates for ANOVA method were between 2.29-4.62 % under the same conditions.The ANOVA appeared to be the most affected method by the sample size.Under small deviations from homogeneity of variances (1:1:4) the most reliable results were obtained from the ANOVA method.Increasing the degrees of the heterogeneity of the variances (1:1:9 and 1:1:25) caused the Type-I error rates in all methods that biased more than 5.0 %.

5) N−k n
When distributions were not normal but the variances were homogeneous, the Type I error rates for all methods were generally similar to the Type-I error rates realised under normal distribution.It may be suggested that as long as the variances are homogeneous the distribution shape does not affect Type-I error rates of all methods.However it is not possible to reach the same conclusions when the variances are not homogeneous.The Type I error rates for all methods were affected negatively when the distributions were not normal and variances were not homogeneous.Moreover, this negative effect was pronounced more when samples were taken from χ 2 (3)-distribution.
If we make a general evaluation for k=3, when the variances are equal, regardless of the shape of the distribution, as the number of observed values in groups increases, the Type-I error rates regarding all methods, have a tendency to tend to gather around 5.0 %.

Five groups
Type-I error rates concerning all methods were displayed to be affected from an increase the number of groups to be compared from 3 to 5.This effect appeared slightly positive when the variances were homogeneous.Because under this experimental conditions, Type-I error rates with regard to all methods were gathered generally around 5.0 %.In case of the variances becoming heterogeneous, however, the Type I error rates regarding all methods, have a tendency to deviate from 5.0 %.Especially, when the variance ratios were 1:1:25, the Type I error rates have raised even over 20.0 %.
Under the same conditions, when the distributions were t (10) and β (5,2), the Type I error rates for all methods were generally around 5.0 % under homogeneity of variances.When the distributions were χ 2 (3), it has been observed that Type I error rates were generally a bit less than 5.0 %.However, in parallel to the increase in sample size it shows a trend approaching to 5.0 %.
When the variances were heterogeneous, all methods tended to deviate from nominal alpha level (5.0 %).These deviations appeared to be more obvious especially in the skew distributions (χ 2 and β).Therefore, all methods were adversely affected when both homogeneity of variance and normality assumptions were not satisfied.At the same time, increasing the number of groups to be compared, had seriously affected the type I error rates especially when the variances were heterogeneous.This effect was observed to be more obvious when the variance ratios were 1:1:25.

Test power
Empirical test power results of 30 000 simulation runs are given in Table 5-8.When samples were taken from normally distributed populations (Table 5), the ML method was slightly more powerful when compared to ANOVA and REML regardless of the number of groups, sample sizes, mean differences and variance ratios in general.When variances were homogeneous (1:1:1), the test power values for ANOVA, REML and ML methods were estimated between 5.94-96.69%, 6.84-96.69% and 7.63-97.05% for k=3.Under these conditions, the test power estimates for ANOVA and REML methods are almost the same.All methods were affected from heterogeneity of the variances negatively and this effect was more obvious especially when variance ratios were 1:1:9 and 1:1:25.When the variances were heterogeneous, increasing the number of groups from 3 to 5 has increased test power estimates of all methods regardless of distribution shapes (especially when distributions were Normal, t-and Beta) and sample sizes (especially when n≥15).Empirical test powers (%) when distributions were N (0,1) n=5:5:5 / 5:5:5:5:5
The test power estimates for all methods under β (5,2) distributions were similar to those of the estimates under normal distributions especially when the variances were homogenous (1:1:1) or the deviations from homogeneity of variances were small (1:1:4).
However, the effect of the distribution shape on test power became more apparent as the variance ratios were increased.Under these experimental conditions, the ML method was found to be slightly superior when compared to the ANOVA and REML when distributions were normal (0, 1) and t (10).On the other hand, in parallel to the increase in sample sizes, the test powers get close to each other.

Discussion
The estimation of variance components methods are commonly used in population genetics and applied in animal breeding to estimate the amount of variation between treatments in total variation and to estimate some breeding values (Rash & Masata 2006).For this purpose different methods were developed.In this study the ANOVA, ML and REML methods commonly used in practice, were compared to each other according to their performances (Type I error rate and test power) under 384 different experimental conditions.Results of this study showed that the ML method showed a slightly higher performance when compared to REML and ANOVA in general.On the other hand, ANOVA and REML estimates were found similar when variances were homogeneous.However, in parallel to the increase in the sample size (especially n≥15), both the Type-I error rates and the power values of the test of all the methods have gradually approached to each other.Searle (1971) and Searle et al. (2006) reported that the REML estimates were usually less biased when compared to ML estimates.For this reason, the REML estimates were generally preferred to ML and ANOVA in practice.Yang & Su (2005) reported that the performances of the ML, REML and ANOVA estimators became very similar when sample sizes were large enough.On the other hand, they could all perform poorly when there was an unbalanced design with large variances.Results of this study also showed that all methods were affected adversely from heterogeneity of variances.But, the existence of lower level (1:1:4 and 1:1:1:1:4) deviations from the homogeneity of variances did not affect the Type I error and test power estimates.When a general evaluation was made, it is possible to assert that regardless of the shape of the distribution, if the variances were homogeneous, ML and REML methods were superior to ANOVA.Under the same conditions, on the other hand, if the variances were not homogenous, the Type I error rate and test power of the ANOVA, ML, and REML were at very low levels.Therefore, ANOVA, ML, and REML estimates would be biased under these conditions.Sample size and number of groups are also important factors in determining appropriate methods which were used to estimate variance components.Past studies indicated that the probability of negative variance component estimate decreases as either k or n increases (Searle et al. 2006).Regardless of distribution shapes and number of groups, if n<15 ML and REML methods might be preferred to the ANOVA.However, when either the number of groups or sample size was increased (n≥15), the ANOVA method may be used along with ML and REML.Because as the sample sizes were increased, the numbers of negative component estimates were decreased.
the REML estimates provided being nonnegative.Note that REML estimators are the same as the ANOVAs when studied with balanced design.Likewise ML estimators, REML estimators must also satisfy σ 2 e and σ 2 α ≥0, If MSA<MSE then σ2 α = 0 and σ 2 e = STT (Patterson & Thompson 1974).