Indeed, trials in which we are interested only in post-treatment scores, and where change is not of interest, are rather rare, being primarily confined to iatrogenic symptoms such as post-operative pain or chemotherapy vomiting. There are two implications for methodologic research on the relative value of parametric and non-parametric techniques.
First, we should worry about the distribution of change scores. It seems likely that change from baseline would approximate more closely to a normal distribution than the post-treatment score.
This is because change scores are a linear combination and the Central Limit Theorem therefore applies. As a simple example, imagine that baseline and post-treatment score were represented by a single throw of a die. The post-treatment score has a flat uniform distribution, with each possible value having an equal probability figure 1a. The change score has a more normal distribution: there is a peak in the middle at zero — the chance of a zero change score is the same as the chance of throwing the same number twice, that is 1 in 6 — with more rare events at the extremes — there is only a 1 in 18 chance of increasing or decreasing score by 5 Figure 1b.
Distribution of scores for a single die roll and the difference between two die rolls. The change score tends towards a more normal distribution. Moreover, where an endpoint is measured at baseline and again at follow-up, the t -test is not the recommended parametric method. Analysis of covariance ANCOVA , where baseline score is added as a covariate in a linear regression, has been shown to be more powerful than the t -test [ 9 — 11 ].
It has several additional advantages: it adjusts for any chance baseline imbalances; it can be extended to incorporate randomization strata as co-variates, which has been shown to increase power [ 12 ]; it can also be extended to incorporate time effects where measures are repeated. Such a comparison does not appear to have been reported previously. I aimed to compare relative power of the two methods under a variety of distributions. As a secondary objective, I aimed to determine whether ANCOVA provided an unbiased estimate for the difference between groups where data did not follow a normal distribution.
A third, overarching aim was to investigate the distribution of change scores between repeat assessments of a non-normally distributed variable. The starting point for this study was to obtain archetypal data sets for analysis. I will follow Bridge [ 7 ] in choosing empirical rather than theoretical distributions. I examined the distribution of a large number of empirical data sets and cross-referenced these with those described by Micceri, who systematically obtained data sets from the psychological and educational domains [ 8 ].
The most common distribution appeared one with moderate positive skew. This distribution was also used with scores reversed, to create a distribution with moderate negative skew. A second pain data set, this time from a trial on athletes with shoulder pain [ 14 ], provides an example of a more uniform distribution Figure 3.
Data on Ki67, an antigen that is a marker for cell proliferation, were obtained from a randomized comparison of two hormonal treatments for breast cancer [ 15 ]. The distribution for Ki67 is comparable to Micceri's "extreme asymmetry distribution" Figure 4. For extreme negative skew, I used data from the physical functioning scale of the SF36 Figure 5 , again taken from the headache trial. As a comparison group, data were also drawn from a normal distribution with a mean of 5 and a standard deviation of 1.
Distribution of post-treatment and change scores from original and simulated data for headache severity "moderate positive skew" distribution. Distribution of post-treatment and change scores from original and simulated data for shoulder pain "uniform" distribution.
Distribution of post-treatment and change scores from original and simulated data for Ki67, a biomarker of cell proliferation "extreme asymmetry" distribution. Distribution of post-treatment and change scores from original and simulated data for physical functioning scale of the SF36 "extreme negative skew" distribution.
For each of the distributions, I created a polynomial that converted normal data to a distribution with an approximately similar shape. For example, the distribution with moderate positive skew in Figure 2 was simulated by sampling x from the normal and creating a new variable equal to The simulation distributions were compared to the empirical distributions by visual inspection and comparison of the standard deviation, skewness and kurtosis.
To run the simulations, a bivariate normal mean 0, standard deviation 1 with a specified correlation was created for a trial of a given sample size equally divided in two groups. The polynomial was applied and a treatment effect introduced. The t -test and Mann-Whitney used the follow-up score if correlation was less than 0. This maximizes the power of these tests [ 11 ] and might be seen as favoring unadjusted tests on the grounds that the correlation between baseline and follow-up scores is not known when the protocol for statistical analysis is written.
Note that the correlation cited in the results is the correlation between baseline and follow-up in the control group. Some previous workers have used the overall correlation using both groups when investigating the properties of ANCOVA [ 11 ]. The difference between these two values was small in the context of our simulations, for example, a correlation of 0. Simulations were repeated times for each combination of sample size 10, 20, 30, 40, 60, , , , and correlation 0. The exception was extreme asymmetry data for the Ki67 biomarker.
The baseline and post-treatment distributions had quite different shapes and different polynomials were used to model each. This constrained the range of possible correlations, hence only the empirical correlation observed in the original study was used, 0.
Results were compared between different methods using the "relative efficiency" RE measure. This gives the relative number of patients required for a study analyzed using parametric methods so that power was equivalent to the non-parametric alternative.
Hence an RE of 1. Note that, although it is arguable that the null hypotheses for different tests, say the t -test and Mann-Whitney, are technically different, the conclusions drawn by investigators of a randomized trial given a particular p -value will be the same, regardless of the analytic method used.
Hence direct comparison of the power of different tests is justified in this setting. The figures show the distributions of post-treatment and change scores from the original data and associated simulations. Visual comparison of subfigures a with b , and c with d , suggests that the polynomials used for the simulations produce distributions that are reasonably similar to the related empirical distribution. Comparing subfigures a to c , and b with d , it is apparent that, as hypothesized, the change between baseline and follow-up scores tends towards the normal distribution.
These visual impressions are confirmed in Table 1 , which shows estimates of the shape parameters for the distributions. The shape parameters for the empirical and simulated data are similar, and skewness is much closer to zero for the change score compared to the follow-up score.
As a second check on the simulations, Table 2 compares the power of t -test and Mann-Whitney. The data for post-treatment scores were obtained by combining all data from simulations where correlation was less than 0. These results broadly replicate those of previous workers and therefore provide support for the methods of the current study.
In particular, the increase in relative efficiency of the t -test under normality or uniform is trivial compared to its loss in relative power under asymmetry. Two aspects of Table 2 have not been reported previously.
First, RE can vary depending on whether the treatment effect is a shift or a ratio change. Second, the power of Mann-Whitney and t -test are more similar RE closer to 1 for change scores, presumably because change scores are more normally distributed. An exception is for extreme asymmetry, where Mann-Whitney has extremely poor power for change scores.
Table 3 gives RE for each combination of sample size and correlation for the moderate positive skew data, where the treatment effect was a shift. Table 4 shows the RE for each of the different distributions combining data for correlations between 0. Mann-Whitney is superior for some very small sample sizes, but RE is non-trivially larger than 1 across sample sizes only for the extreme negative skew distribution with a ratio treatment effect.
In table 5 , data are given by correlation, combining sample sizes. The table has one particularly notable feature: for some distributions, RE's drop dramatically between correlation of 0. This is apparently because the endpoint analyzed changed from the post-treatment score to the change score at correlations of 0.
This was to maximize power following previous work on the power of unadjusted tests based on the normal [ 9 , 11 ]. As it seems possible that the relative power of analyzing change and post-treatment scores may differ between the normal and asymmetric case, the data were reanalyzed using post-treatment scores only see Table 6.
However, other considerations often play a role because parametric tests can often handle nonnormal data. Finally, if you have a very small sample size, you might be stuck using a nonparametric test.
Please, collect more data next time if it is at all possible! Your chance of detecting a significant effect when one exists can be very small when you have both a small sample size and you need to use a less efficient nonparametric test! Minitab Blog. Nonparametric analysis to test group medians. Hypothesis Tests of the Mean and Median Nonparametric tests are like a parallel universe to parametric tests.
Parametric analyses Sample size guidelines for nonnormal data 1-sample t test Greater than 20 2-sample t test Each group should be greater than 15 One-Way ANOVA If you have groups, each group should be greater than If you have groups, each group should be greater than Reason 3: Statistical power Parametric tests usually have more statistical power than nonparametric tests. If the mean accurately represents the center of your distribution and your sample size is large enough, consider a parametric test because they are more powerful.
If the median better represents the center of your distribution, consider the nonparametric test even when you have a large sample. You Might Also Like. It should be noted that checking normality of data produced by smaller samples can be difficult. Sometimes with a small sample, the data displayed in a histogram will be obviously asymmetrical, but there are certainly occasions in which it is impossible to tell.
This is because with a small sample, the histogram may not be smooth even if the data are normal. There might not be any significant evidence of symmetry or asymmetry, which can make it difficult to determine whether the data are normal or not. However, one way to get around this obstacle is to leverage instances in which the same measurements have been measured from a previous, larger sample in an earlier study. If your data is not normal, there are a few steps you can take prior to performing a nonparametric test.
If your data has a generally skewed distribution, you could consider a transformation of the data. When data is significantly skewed in one direction or the other, sometimes there are patterns that can be observed. By observing these patterns, you can then reframe your histogram so that the patterns are accounted for, and the histogram displays more normality.
Another option here is to simply perform your analysis without considering the outliers, and then perform the analysis again while considering the outliers. If the normality of your data is clearly in doubt, parametric tests will lead to seriously confusing data insights.
The following statistical analyses can be applied to data that is assumed to have a normal distribution:.
0コメント