Power/sample size calculations

Power calculations were performed for outcome measures that are often targeted in pregnancy studies: neural tube defects, stillbirth, neonatal death, infant mortality, congenital anomalies, low birth weight, preterm birth, and miscarriage. The estimates given in the Pregnancy Toolkit Power Calculations spreadsheet are provided to serve as examples only. Study investigators should conduct their own power/sample size calculations when designing a study.

Three probabilities (low end, middle, high end) are considered for each outcome type, based on prevalence estimates for each outcome acquired from a literature search (see  Outcome measures / Endpoint prevalence ranges). The assumed probability of the outcomes is displayed in Table 1.

Download

Table 1: Reference Group Event Probabilities (P2) used for Power Calculations

 Probabilities for Power Calculations
OutcomeLow endMiddleHigh end
Neural tube defects0.00080.00120.005
Stillbirth*0.0050.020.025
Neonatal death (through 28 days)0.0170.0230.032
Infant mortality (through 1 year of life)0.0260.0280.031
Maternal mortality (during pregnancy or within 42 days)0.010.030.05
Congenital anomalies0.010.030.043
Low birth weight (<2,500g; live born)0.110.120.14
Prematurity (PTB; <37 weeks’ gestation)0.060.120.22
Small for gestational age (SGA;<10th percentile) 0.170.200.30
Miscarriage (<20 weeks’ gestation)0.150.200.35

*Definition ranged from ≥20 to ≥28 weeks’ gestation.

A composite outcome including stillbirth (SB), preterm birth (PTB), small for gestational age (SGA) and neonatal death (ND) was calculated using the mid-range probability estimates for each outcome (SB, 0.02; PTB, 0.12; SGA, 0.2; ND, 0.028), as follows: 

Composite = SB + PTB*(1-SB) + SGA*(1-PTB)*(1-SB) + ND*(1-PTB)*(1-SGA)*(1-SB)*0.5

For the purpose of this exercise, preterm birth, small for gestational age and neonatal death are assumed to be independent and are conditional upon live birth. The value for neonatal death is adjusted (*0.5) to account for the lower proportion of neonatal deaths among infants born at term and of appropriate weight for gestational age.

Power was calculated using a two-sided alpha = .05 z-test (normal approximation) for a difference in proportions with an unpooled variance. The sample sizes for both groups were assumed to be equal (N1=N2; N=N1+N2). No adjustment for missing data or lost-to-follow-up was made. The reference group probability is denoted as P2 and was set using the values shown in Table 1. P1 is the event probability for the experimental group. The difference (P1-P2) is the alternative hypothesis (working hypothesis). The risk ratio is denoted as R1 and is calculated from P1/P2. The risk ratio is displayed to help make comparisons across the various outcome measures. P1 was set using two different strategies. The first strategy (‘By Sample Size’ tab in the Excel file) solved for P1 to obtain the desired level of power at a given sample size: N=500, 1000,1500, 2000. For this strategy only the middle scenario depicted in Table 1 was considered. The second strategy (tabs other than ‘By Sample Size’ in the Excel file) first set R1 (i.e., 1.25, 1.5, etc.) to compute P1 and then solved for N to obtain the desired level power.

In addition to the two-sample tests, power for one-sample tests of proportions with normal approximation were calculated for congenital anomalies and neural tube defect. Unlike the two-sample test, these do not account for uncertainty in the reference group and assume that testing will be conducted by comparing the observed proportion to a known probability. The excel sheets that include the one-sample tests are named “Congenital Anomalies One Group” and “Neural Tube Defect One Group”.

Three levels of power were considered: 0.8 (standard), 0.90 (standard), .975 (decision-theoretic). When considering power for a safety endpoint it is important to consider that power is 1 minus type II error. If power is set to 0.8, there is a probability of 0.20 to fail to reject the null hypothesis of no difference, when in fact the alternative hypothesis is true. Higher power is generally preferred to minimize type II errors. Power of 0.975 was also presented. This level of power has two important properties: 1) If the null hypothesis is rejected, the resulting 95% confidence interval will contain the alternative hypothesis, and 2) if the null hypothesis is not rejected, the 95% confidence interval will not contain the alternative hypothesis. As a result, 0.975 power level allows the rejection (acceptance) of either the null hypothesis or the alternative, but not both.

Power Analysis Sample Size (PASS) V15 was used for all the power calculations.

Disclaimer


The provision of study materials and links from the toolkit to other websites is provided for your convenience and does not indicate endorsement of those materials or sites by WHO. WHO accepts no responsibility for the validity or accuracy of their content. The mention of specific companies or of certain manufacturers' products does not imply that they are endorsed or recommended by WHO in preference to others of a similar nature that are not mentioned. Errors and omissions excepted, the names of proprietary products are distinguished by initial capital letters.