Introduction

Clefts of the lip and palate create problems in feeding, speech, hearing, dental development and facial growth. Despite surgical and multidisciplinary advances in cleft care, associated speech difficulties and facial appearance represent serious barriers to social integration. As a consequence of the multifaceted nature of the condition, there are numerous decisions with respect to the possible clinical, surgical, audiological, orthodontic and speech interventions that have to be made at different times extending from the birth of the child to adolescence and beyond. For certain individuals, the specific intervention chosen at a given time may be non-controversial and firmly evidence-based, whereas at other times there may be options available that have not yet been rigorously evaluated. In general, there is a paucity of evidence from RCTs for many aspects of the management choices to be made for patients with cleft lip and palate. Indeed Berkowitz argues,1 in the context of the use of nasoalveolar moulding, for rigorous evaluation, as did Lee2 much earlier in the wider setting of cleft lip and palate patients.

Despite the plethora of potential questions arising from the complexities of care in patients with cleft lip and palate, a review by Hardwicke and colleagues3 identified only 62 published RCTs from 1 January 2004 to 31 December 2013. Among the 62 RCTs identified, only 10 concerned surgical techniques, with a median trial size of 86 (range 47–376). The largest of these, with 467 infants, was conducted by Williams and colleagues,4 who compared two different lip repairs with two types of palate repair conducted at two different ages in a so-called factorial design. Lu and colleagues5 commented on the advantages and difficulties associated with this design. The other RCT that evaluated repair at different ages was that of Ysunza and colleagues6 who compared results following surgery at two different ages in 76 infants.

We review information from the RCTs concerning palatal surgery at different ages in infants and discuss some of the logistical and statistical problems associated with the design and conduct of such trials. For the purpose of this article, we focus on speech outcomes with respect to VPC. Finally, we propose a format for an international multicentre trial of a very flexible design to investigate the appropriate timing of surgery in this context.

Method

A 2017 review by Hardwicke and colleagues3 listed all the RCTs conducted in cleft lip and palate over a 10-year period from 1 January 2004 to 31 December 2013 in English using the Cochrane Central Register of Controlled Trials, MEDLINE® and EMBASE with key words ‘cleft lip’ or ‘cleft palate’. From this review we were able to identify two RCTs that compared different ages at the time of surgery.4,6 A literature search from 1 January 2014 to 29 February 2020 identified four further RCTs, two of which compare surgical timings.7,8 Surgical timings investigated by Shaffer and colleagues9 describe a retrospective (hence non-randomised) study of the experience from a single craniofacial clinic and was excluded. A survey by Slator and colleagues of 18 cleft centres in the United Kingdom concluded ‘there remains considerable variation in both the sequence and timing of surgical repair of cleft lip and palate in infancy’ and was also excluded.10 See Figure 1.

Fig 1
Fig 1.PRISMA flowchart

Results

Clinical outcomes

Hardwicke and colleagues’ review3 identified RCTs comparing times of palatal surgery conducted by Ysunza and colleagues6 and Williams and colleagues.4 Since that review, the Scandcleft Consortium11 has reported on three surgical trials (Scandcleft 1, 2 and 3) conducted in parallel in 163, 162 and 154 infants respectively. Only Scandcleft 1, reported by Willadsen and colleagues,12 compared palatal surgery at different ages. In 2019 Yeow and colleagues8 compared timings of palatal surgery in 76 infants with isolated cleft palate. The details of these four RCTs, that investigated surgical-related outcomes following different surgical timing for palatal repair and the corresponding VP insufficiency (VPI) rates quoted at a variety of ages, are given in Table 1.

Table 1.Summary of VPI rates from published RCTs comparing palatal surgery conducted at different ages
Lip Palate Age at surgery (m) Initially randomised (n) VPI present VPI absent Total analysed VPI present (%)
Williams and colleagues4 —speech assessed at age four or more years
Spina von Langenbeck 9–12 51 14 37 51 27.5
Millard von Langenbeck 9–12 52 15 37 52 28.8
Spina von Langenbeck 15–18 46 12 34 46 26.1
Millard von Langenbeck 15–18 54 18 36 54 33.3
Spina Furlow 9–12 35 2 33 35 5.7
Millard Furlow 9–12 43 7 36 43 16.3
Spina Furlow 15–18 48 12 36 48 25.0
Millard Furlow 15–18 47 11 36 47 23.4
Total 376 91 285 376 24.2
Ysunza and colleagues6 —speech assessed at age four years
San Venero 6 35 6 29 35 17.1
Roselli pharyngoplasty§ 12 41 8 33 41 19.5
Total 76 14 62 76 18.4
Willadsen and colleagues7 —speech assessed at age five years
Gothenburg flap 12 83 13 59 72 18.1
Vomer flap§§ 36 80 16 55 71 22.5
Total 163 29 114 143 20.2
Yeow and colleagues8 —speech assessed at age three years
VWK 6 18* 1 15 16 6.3
VWK 12 20** 3 11 14 21.4
2F-IVV 6 18* 2 14 16 12.5
2F-IVV 12 20** 1 13 14 7.1
Total 76 7 53 60 11.7

2F-IVV=2-flap palatoplasty with intra-velar veloplasty; RCT=randomised controlled trial; VPI=velopharyngeal insufficiency; VWK=Veau-Wardill-Kilner type palatoplasty. §Details are given by Trigos and colleagues13 §§Details are given by Rautio and colleagues14 *One from each group randomised to timing only. **One randomised to type of palatal surgery only

All patients in Ysunza and colleagues’ trial received San Venero Roselli (SVR) pharyngoplasty and all those in Willadsen and colleagues’ trial received Gothenberg and Vomer flap hard palate closure. In contrast, Williams and colleagues’ trial compared von Langenbeck and Furlow palatoplasties, while Yeow and colleagues’ trial compared Veau-Wardill-Kilner (VWK) palatoplasty with two-flap (2F) palatoplasty in conjunction with intra-velar veloplasty (IVV), denoted 2F-IVV.

All trials compared ‘early’ and ‘late’ age at palatal surgery (which we classify later into four age categories as ‘very early’, ‘early’, ‘late’ and ‘very late’). Two trials compared outcomes following surgery at six months and 12 months,6,8 and one at 12 months versus 36 months,7 while the fourth considered two ranges of timings of nine to 12 months versus 15–18 months.4 Figure 2 shows the VPI rates by early and late surgery by type of palatal repair for the four RCTs. Apart from 2F-IVV in Yeow and colleagues’ trial,8 late repair was associated with higher rates of VPI.

Fig 2
Fig 2.VPI rates reported from the four published RCTs comparing ‘early’ and ‘late’ palatal repair

2F-IVV=2-flap palatoplasty with intra-velar veloplasty; RCT=randomised controlled trial; VPI=velopharyngeal insufficiency; VWK=Veau-Wardill-Kilner type palatoplasty.

The difference in VPI rates, together with the associated 95 per cent confidence intervals (CI), between the broad categories ‘early’ and ‘late’ timings are summarised in Table 2. However, the different ages at which VPC was assessed in the four trials prevent a reliable overall synthesis. Nevertheless, all but one (Furlow technique in Williams and colleagues’ trial)4 of the wide CIs in the final column of Table 2 include zero difference (no effect), so there remains much uncertainty with respect to the influence of surgical timing.

Table 2.Differences in VPI rates between types of palatal surgery conducted at ‘early’ and ‘late’ ages within each trial
RCT Surgical technique Age at speech assessment (y) Early Late Late–Early
VPI (%) VPI (%) (%) 95% CI (%)
Williams and colleagues4 Furlow
von Langenberg
>4
>4
11.5
28.2
24.2
30.0
12.7
1.8
+1.8 to +24.4
-10.5 to +14.2
Ysunza and colleagues6 San Venero Roselli pharyngoplasty 4 17.1 19.5 2.4 -14.7 to +20.5
Willadsen and colleagues7 and Semb and colleagues11 Gothenburg and Vomer flap 5 18.1 22.5 4.5 -8.7 to +17.7
Yeow et al8 VWK 2F-IVV 3
3
6.3
12.5
21.4
7.1
15.2
-5.4
-11.5 to +41.3
-31.3 to +18.9

2F-IVV=2-flap palatoplasty with intra-velar veloplasty; RCT=randomised controlled trial; VP=velopharyngeal; VPI=velopharyngeal insufficiency; VWK=Veau-Wardill-Kilner type palatoplasty.

Timing (age at palatal surgery)

As we have indicated, the four RCTs used different definitions for ‘early’ and ‘late’ palatal surgery. In practice, despite a specification of six months and 12 months in Yeow and colleagues’ trial,8 Figure 3 shows that the variation at actual age of surgery was quite considerable within each timing group. Thus, the median age of ‘very early’ surgery was close to six months (6.13 m; range 5.32–7.59 m), whereas for ‘late’ the median was 11.43 months (range 10.11–12.87 m) with 82 per cent of 33 infants receiving scheduled surgery before the protocol stipulation of age 12 months.

Fig 3
Fig 3.Age of infants (months) at the time of surgery by randomised allocation* (*unpublished data from Yeow and colleagues8)

2F-IVV06=two-flap palatoplasty with intra-velar veloplasty at six months; 2F-IVV12=two-flap palatoplasty with intra-velar veloplasty at 12 months; VWK06=Veau-Wardill-Kilner type palatoplasty at six months; VWK12=Veau-Wardill-Kilner type palatoplasty at 12 months.

Williams and colleagues state that ‘Palatal repairs were performed between the ninth and 30th month, with a mean age of 12.85 m (SD=3.3)’. The summary data suggest a skewed distribution towards the lower age at surgery, implying that relatively few children were actually operated at, or close to, 30 months. However, Ysunza and colleagues give no indication of departure from six months (very early) and 12 months (late) scheduling,6 and neither do Rautio and colleagues with respect to 12 months (late) and 36 months (very late) in Scandcleft.14 Figure 4 suggests that the VPI rate does not rise as the age at palatal surgery increases. We note again that the age when VPI assessments were made differs between the four RCTs concerned.

Fig 4
Fig 4.Reported VPI rates categorised into four age-at-palatal-surgery groups.

VPI=velopharyngeal insufficiency.

Randomisation

In a clinical trial the usual method is to allocate equal numbers of patients at random to the respective alternatives ensuring balance in the patients recruited by the end of the trial. In general, equal numbers in each group are statistically the most efficient. Only Yeow and colleagues give details of their randomisation process and, although the trial closed prematurely, the numbers in the four groups are close to equal.8 This appears to be the case for the two groups in Rautio and colleagues’ trial,14 less so for Ysunza and colleagues’ trial6 and far from the case for Williams and colleagues’ trial.4 No explanation is provided for the large disparity (ranging from 35 to 51) between infant numbers in the eight groups of Williams and colleagues’ trial.4 As Rautio and colleagues14 used dice to generate the randomisation (and opening of sealed envelopes to reveal the allocation), the close proximity of the numbers in each group (83 and 80) seems fortuitous.

Current standards would tend to proscribe the use of dice for generating the randomisation list and the use of sealed envelopes for implementation. As stated by Suresh, ‘it is better to use […] computer programming to do the randomization’15 particularly for large trials and those of a complex design. Additionally, some regulatory bodies overseeing RCTs insist that the randomisation sequence must be reproducible.

Although numbers are small, one consequence of randomising the infants when six months of age to all four groups in Yeow and colleagues’ trial was that although all 36 allocated palatal surgery at six months received their surgery close to that time, among the 40 allocated to the ‘late’ 12 month group, seven (17.5%) withdrew from the trial before the scheduled surgery could be activated. Furthermore, as Figure 3 indicates, palatal surgery was conducted as early as 10 months of age in this 12 month group. It cannot be deduced whether comparable delays and losses occurred in the other three RCTs.

Trial size

The period covered by this review extends over 20 years during with 782 infants recruited to address the particular role of age at palatal surgery with respect to VPC at a later age. Of these patients, results from 655 (84%) have been reported. Nevertheless, it appears that no firm inferences can be drawn from the resulting data so a truly evidence-based conclusion remains elusive. If we take the results from the four RCTs considered for VPI with ‘early’ as opposed to ‘late’ palatal surgery, Table 3 gives the corresponding sample sizes required for a randomised parallel group trial if these finding were to be anticipated in future trials.

Table 3.Sample sizes required, assuming two-sided test size 5 per cent and power 80 per cent for confirmatory trials of two palatal surgery timings using the outcomes from the four RCTs as planning values
RCT Actual trial recruitment (VPI assessed) VPI (%) Observed effect size Odds ratio Confirmatory trial size
Early Late Late–Early OR
Ysunza and colleagues6 76 (76) 17.1 19.5 2.4 0.8515 8,200
Williams and colleagues4* 467 (376) 21.0 27.2 6.2 0.7115 1,500
Willadsen and colleagues7 and Semb and colleagues11 163 (143) 18.1 22.5 4.4 0.7612 2,600
Yeow and colleagues8* 76 (60) 9.4 14.3 4.9 0.6218 1,400

RCT=randomised controlled trial; VPI=velopharyngeal insufficiency. *VPI calculated from both surgical techniques of Table 2 combined.

Critically, the size of trial depends on the anticipated effect size (the larger the effect size, the smaller the trial) and on the prevalence of VPI (the binary variable).16 Although the effect size from Yeow and colleagues’ trial8 is smaller than that of Williams and colleagues’ trial,4 the possible confirmatory trial is smaller as the VPI rate for ‘early’ is lower (9.4% as compared to 21%). It is clear from Table 3 that the trial sizes are too large to permit the possibility of any of these confirmatory trials being conducted.

However, Williams and colleagues’ trial4 indicates that VPC is assessed on an 11 point scale graded from 0 to 10 with scores ranging from three to 10 interpreted as being indicative of VPI. Alternatively, Lohmander and colleagues17 have suggested a seven point categorical scale which they subsequently collapsed to a three point classification in their Table 4. In general, if a trial endpoint is defined as an ordered categorical variable, the corresponding sample sizes tend to be smaller than if a binary endpoint is concerned.

Table 4.Sample sizes required, assuming two-sided test size 5 per cent and power 80 per cent for possible trial of two infant age-at-palatal surgery options using VPI measures on a binary (A) and a categorical three point (B) scale
A: Binary (two point) scale
Surgery Late Early Early Early
Planning: (Late–early) 0.05 0.075 0.10
Planning: odds ratio (OR) 0.75 0.64 0.53
VPI proportion
Incompetent 0.25 0.2 0.175 0.15
Marginal/competent 0.75 0.8 0.825 0.85
Trial size 2188 986 500
B: Categorical (three point) scale
Surgery Late Early Early Early
Planning: odds ratio (OR) 0.75 0.64 0.53
VPI proportion
Incompetent 0.25 0.200 0.175 0.150
Marginal 0.35 0.329 0.314 0.293
Competent 0.40 0.471 0.510 0.566
Trial size 1314 552 276

VPI=velopharyngeal insufficiency

Lohmander and colleagues17 reported the results of VPC rates categorised on a three point scale as ‘incompetent’ (VPI) (23.9%), ‘marginally incompetent’ (34.8%), and ‘competent’ (41.3%) in 339 five-year-old infants with repaired cleft palate. We assume a RCT is planned with the aim of reducing VPI levels in infants who have ‘late’ surgery (25%) to a lower rate with early surgery. Then, with a binary endpoint with two-sided test size 5 per cent and power 80 per cent for different (reduced) rates for early (20.0%, 17.5% and 15.0% VPI), the corresponding sample sizes are given in Table 4. These calculations suggest the RCT will range in size from 500 to more than 2000 patients depending on the planning assumption made.

However, using the sample size methods16 concerning a three point categorical scale variable, Table 4 shows for VPI that the proportions of ‘incompetent’, ‘marginal’ and ‘competent’ (25%, 35% and 40% respectively) for ‘late’ surgery, with an assumed planning OR=0.75, would potentially improve (to 20%, 33% and 47%) with ‘early’. What is more, the required sample size is approximately half that for the corresponding binary endpoint. This implies that regarding VPC as a categorical rather than a binary variable would reduce the size of any future trial considerably.

Proposed structure of a collaborative RCT with a pragmatic design followed by a prospective individual patient data meta-analysis

On the basis that the issue of the timing of infant age at palatal surgery remains an open question, it is clear that further trials are required to answer this question, although it is important not to underestimate the challenges that this presents. Indeed, the International Confederation for Cleft Lip and Palate and Related Craniofacial Anomalies Task Force recommended ‘that a prospective international controlled trial is conducted’.18

In the knowledge that there are many individual centres and collaborative groups capable of recruiting substantial numbers of infants with cleft palate/lip anomalies, we propose a pragmatic way forward.

Rather than a single trial, we propose that groups capable of recruiting, for example, 50 eligible infants within a framework of two years, each conduct their own RCT with a view to a future international collaboration to organise a prospective, individual data meta-analysis of these many trials. The possible framework of such a collaboration is summarised in Table 5 which allows individual centres to make specific choices concerning the eligibility of the infants to be included and the surgical options, but with some provisos imposed by the overall design such as an agreed endpoint and how it is to be assessed.

Table 5.Proposed structure of a pragmatic trial to compare surgical timings with VPC as the primary concern and assessed by means of an ordered categorical variable
Surgical timing–early versus delayed
To avoid some of the problems associated with the ‘late’ classification, rather than having two fixed surgical time options, the proposal is to randomise to ‘early’ or ‘delayed’ surgery. Early is defined as current practice within the centre concerned. When delayed surgery is conducted, it will be left to the responsible clinical team (including the parents) to decide but delay should be as long as possible after the early timing but not beyond (for example) one year of age.
Eligibility
Each group is to make its own choice but it is likely to include the ranges of non-syndromic cleft infants such as those covered by the four trials considered in this review. If lip repair is also required, early palate surgery would be conducted as soon as is practical after lip surgery.
Surgical techniques
Each group is to make its own choice of the surgical technique(s) to use. This could be, for example, a randomised option between two surgical approaches.
Clinical endpoint
VPC assessed using a standard approach (such as by Lohmander and colleagues15) but recording and reporting of the individual variable scores rather than merely the transfer values suggested.
VPC to be assessed as close as possible (± four weeks) to the birthdays at three and five years.
Minimal data to be recorded and retained by the group
Design features On-study Follow-up
Centre name and contact Infant birthdate VPC at age three years
Surgical option(s) Cleft lip (if relevant) VPC at age five years
General eligibility Date of lip surgery (if relevant)
Cleft palate
Date of randomisation
Surgical timing (immediate or delayed)
Date of palatal surgery
Data exchange
To monitor overall progress of the multi-group trial, groups would send their anonymised data annually (at a fixed date) to be checked for completeness by the coordinating centre, which would need to be identified.
Meta-analysis
Although the minimum duration of recruitment to this trial may be set at six years, interim analysis of VPC at age three years could be published after (for example) five years from commencement of the international collaboration, provided sufficient data have been accumulated.

VPC=velopharyngeal competence

The proposal envisages that the ongoing timing of primary surgery (TOPS) for cleft palate trial by Shaw and colleagues would eventually form part of the prospective meta-analysis.19 Their trial relates to non-syndromic isolated cleft palate participants who have received the Sommerlad surgical technique either at six or 12 months.20 The main outcome variable is VPC at five years but also includes an assessment at three years. Their trial is closed for recruitment with the final follow-up assessments due in July 2020.

Although this article has been written in the context of the unresolved question with regard to surgical timings, the general structure of the proposal allows for a similar approach to be adapted to accommodate other unanswered aspects of cleft management which require RCTs to be conducted. As Bekisz and colleagues conclude, following their review of RCTs in cleft and craniofacial surgery, ‘Our community should consider methods by which more RCTs can be performed’.21

Conclusion

The objective of this review article was to review the available evidence from RCTs concerning the age at which palatal repair is best conducted in infants. Our review suggests no firm conclusions can yet be drawn with respect to the rates of later VPC. As a consequence, we outline the structure of a pragmatic RCT as a basis for further investigation of the optimal of age at surgery (or other relevant research questions).


Conflict of interest

The authors have no conflicts of interest to disclose.

Financial declaration

The authors received no financial support for the research, authorship, and/or publication of this article.

Revised: 23 May 2022