On randomised trials of surgical timings for cleft palate repair

Background: In this article we review randomised controlled trials (RCTs) comparing palatal surgery at different ages to examine their design features and quantify their conclusions. Method: A literature search of RCTs comparing surgical timings for cleft palate and/or lip repair from 1 January 2004 to 31 December 2013 was undertaken. This supplements an earlier systematic review of 62 RCTs in cleft lip and palate from 1 January 2004 to 31 December 2013 in English using the Cochrane Central Register of Controlled Trials, MEDLINE ® and EMBASE with key words ‘cleft lip’ or ‘cleft palate’. Results: Four RCTs were identified, each comparing palate surgery at different lower (six, nine and 12 months) and upper (12, 18 and 36 months) ages with velopharyngeal competence (VPC) also at different ages (three, four, more than four or five years). These surgical and assessment age differences prevent a synthesis using meta-analysis techniques. Nevertheless, three RCTs indicate that VP function is more impaired with later surgery. Two ask questions about the type of surgery; one suggests that VPC is greater with Furlow palatoplasty than von Langenbeck surgery and the other that one technique appears to have better VPC at six months and the other at 12 months. Conclusion: The role of the timing of palatal surgery with respect to VPC remains unclear. We propose an international strategy that is designed to establish the optimal age for palatal surgery in infants requiring palate and/or lip repair.


Introduction
Clefts of the lip and palate create problems in feeding, speech, hearing, dental development and facial growth. Despite surgical and multidisciplinary advances in cleft care, associated speech difficulties and facial appearance represent serious barriers to social integration. As a consequence of the multifaceted nature of the condition, there are numerous decisions with respect to the possible clinical, surgical, audiological, orthodontic and speech interventions that have to be made at different times extending from the birth of the child to adolescence and beyond. For certain individuals, the specific intervention chosen at a given time may be noncontroversial and firmly evidence-based, whereas at other times there may be options available that have not yet been rigorously evaluated. In general, there is a paucity of evidence from RCTs for many by Williams and colleagues, 4 who compared two different lip repairs with two types of palate repair conducted at two different ages in a so-called factorial design. Lu and colleagues 5 commented on the advantages and difficulties associated with this design. The other RCT that evaluated repair at different ages was that of Ysunza and colleagues 6 who compared results following surgery at two different ages in 76 infants.
We review information from the RCTs concerning palatal surgery at different ages in infants and discuss some of the logistical and statistical problems associated with the design and conduct of such trials. For the purpose of this article, we focus on speech outcomes with respect to VPC. Finally, we propose a format for an international multicentre trial of a very flexible design to investigate the appropriate timing of surgery in this context.

Method
A 2017 review by Hardwicke and colleagues 3 listed all the RCTs conducted in cleft lip and palate over a 10-year period from 1 January 2004 to 31 December 2013 in English using the Cochrane Central Register of Controlled Trials, MEDLINE ® and EMBASE with key words 'cleft lip' or 'cleft palate'. From this review we were able to identify two RCTs that compared different ages at the time of surgery. 4,6 A literature search from 1 January 2014 to 29 February 2020 identified four further RCTs, two of which compare surgical timings. 7,8 Surgical timings investigated by Shaffer and colleagues 9 describe a retrospective (hence non-randomised) study of the experience from a single craniofacial clinic and was excluded. A survey by Slator and colleagues of 18 cleft centres in the United Kingdom concluded 'there remains considerable variation in both the sequence and timing of surgical repair of cleft lip and palate in infancy' and was also excluded. 10 See Figure 1.

Clinical outcomes
Hardwicke and colleagues review 3 identified RCTs comparing times of palatal surgery conducted by Ysunza and colleagues 6 and Williams and colleagues. 4 Since that review, the Scandcleft Consortium 11 has reported on three surgical trials (Scandcleft 1, 2 and 3) conducted in parallel in 163, 162 and 154 infants respectively. Only Scandcleft 1, reported by Willadsen and colleagues, 12 compared palatal surgery at different ages. In 2019 Yeow and colleagues 8 compared timings of palatal surgery in 76 infants with isolated cleft palate. The details of these four RCTs, that investigated surgicalrelated outcomes following different surgical timing for palatal repair and the corresponding VP insufficiency (VPI) rates quoted at a variety of ages, are given in Table 1.
All trials compared 'early' and 'late' age at palatal surgery (which we classify later into four age categories as 'very early', 'early', 'late' and 'very late'). Two trials compared outcomes following surgery at six months and 12 months, 6,8 and one at 12 months versus 36 months, 7 while the fourth considered two ranges of timings of nine to 12 months versus 15-18 months. 4 Figure 2 shows the VPI rates by early and late surgery by type of palatal repair for the four RCTs. Apart from 2F-IVV in Yeow and colleagues' trial, 8 late repair was associated with higher rates of VPI.
The difference in VPI rates, together with the associated 95 per cent confidence intervals (CI), between the broad categories 'early' and 'late' timings are summarised in Table 2. However, the different ages at which VPC was assessed in the four trials prevent a reliable overall synthesis. Nevertheless, all but one (Furlow technique in Williams and colleagues' trial) 4 of the wide CIs in the final column of Table 2 include zero difference (no effect), so there remains much uncertainty with respect to the influence of surgical timing.

Timing (age at palatal surgery)
As we have indicated, the four RCTs used different definitions for 'early' and 'late' palatal surgery. In practice, despite a specification of six months and 12 months in Yeow and colleagues' trial, 8 Figure 3  shows that the variation at actual age of surgery was quite considerable within each timing group. Thus, the median age of 'very early' surgery was close to six months (6.13 m; range 5.32-7.59 m), whereas for 'late' the median was 11.43 months (range 10.11-12.87 m) with 82 per cent of 33 infants receiving scheduled surgery before the protocol stipulation of age 12 months.
Williams and colleagues state that 'Palatal repairs were performed between the ninth and 30 th month, with a mean age of 12.85 m (SD=3.3)'. The summary data suggest a skewed distribution towards the lower age at surgery, implying that relatively few children were actually operated at, or close to, 30 months. However, Ysunza Figure 4 suggests that the VPI rate does not rise as the age at palatal surgery increases. We note again that the age when VPI assessments were made differs between the four RCTs concerned.

Randomisation
In a clinical trial the usual method is to allocate equal numbers of patients at random to the respective alternatives ensuring balance in the patients recruited by the end of the trial. In general, equal numbers in each group are statistically the most efficient. Only Yeow and colleagues give details of their randomisation process and, although the trial closed prematurely, the numbers in the four groups are close to equal. 8 This appears to be the case for the two groups in Rautio and colleagues' trial, 14 less so for Ysunza and colleagues' trial 6 and far from the case for Williams and colleagues' trial. 4 No explanation is provided for the large disparity (ranging from 35 to 51) between infant numbers in the eight groups of Williams and colleagues' trial. 4 As Rautio and colleagues 14 used dice to generate the randomisation (and opening of sealed envelopes to reveal the allocation), the close proximity of the numbers in each group (83 and 80) seems fortuitous.
Current standards would tend to proscribe the use of dice for generating the randomisation list and the use of sealed envelopes for implementation.

As stated by Suresh, 'it is better to use […]
computer programming to do the randomization' 15 particularly for large trials and those of a complex design. Additionally, some regulatory bodies overseeing RCTs insist that the randomisation sequence must be reproducible.
Although numbers are small, one consequence of randomising the infants when six months of age to all four groups in Yeow and colleagues' trial was that although all 36 allocated palatal surgery at six months received their surgery close to that time, among the 40 allocated to the 'late' 12 month group, seven (17.5%) withdrew from the trial before the scheduled surgery could be activated. Furthermore, as Figure 3 indicates, palatal surgery was conducted as early as 10 months of age in this 12 month group. It cannot be deduced whether comparable delays and losses occurred in the other three RCTs.

Trial size
The period covered by this review extends over 20 years during with 782 infants recruited to address the particular role of age at palatal surgery with respect to VPC at a later age. Of these patients, results from 655 (84%) have been reported. Nevertheless, it appears that no firm inferences can be drawn from the resulting data so a truly evidence-based conclusion remains elusive. If we take the results from the four RCTs considered for VPI with 'early' as opposed to 'late' palatal surgery, Table 3 gives  the corresponding sample sizes required for a randomised parallel group trial if these finding were to be anticipated in future trials.
Critically, the size of trial depends on the anticipated effect size (the larger the effect size, the smaller the trial) and on the prevalence of VPI (the binary variable). 16 Although the effect size from Yeow and colleagues' trial 8 is smaller than that of Williams and colleagues' trial, 4 the possible confirmatory trial is smaller as the VPI rate for 'early' is lower (9.4% as compared to 21%). It is clear from Table  3 that the trial sizes are too large to permit the possibility of any of these confirmatory trials being conducted.
However, Williams and colleagues' trial 4 indicates that VPC is assessed on an 11 point scale graded from 0 to 10 with scores ranging from three to 10 interpreted as being indicative of VPI. Alternatively, Lohmander and colleagues 17 have suggested a seven point categorical scale which they subsequently collapsed to a three point classification in their

per cent and power 80 per cent for possible trial of two infant age-at-palatal surgery options using VPI measures on a binary (A) and a categorical three point (B) scale
VPI=velopharyngeal insufficiency palate. We assume a RCT is planned with the aim of reducing VPI levels in infants who have 'late' surgery (25%) to a lower rate with early surgery. Then, with a binary endpoint with two-sided test size 5 per cent and power 80 per cent for different (reduced) rates for early (20.0%, 17.5% and 15.0% VPI), the corresponding sample sizes are given in Table 4. These calculations suggest the RCT will range in size from 500 to more than 2000 patients depending on the planning assumption made.
However, using the sample size methods 16 concerning a three point categorical scale variable, Table 4 shows for VPI that the proportions of 'incompetent', 'marginal' and 'competent' (25%, 35% and 40% respectively) for 'late' surgery, with an assumed planning OR=0.75, would potentially improve (to 20%, 33% and 47%) with 'early'. What is more, the required sample size is approximately half that for the corresponding binary endpoint. This implies that regarding VPC as a categorical rather than a binary variable would reduce the size of any future trial considerably.

Proposed structure of a collaborative RCT with a pragmatic design followed by a prospective individual patient data metaanalysis
On the basis that the issue of the timing of infant age at palatal surgery remains an open question, it is clear that further trials are required to answer this question, although it is important not to underestimate the challenges that this presents. Indeed, the International Confederation for Cleft Lip and Palate and Related Craniofacial Anomalies

Surgical timing-early versus delayed
To avoid some of the problems associated with the 'late' classification, rather than having two fixed surgical time options, the proposal is to randomise to 'early' or 'delayed' surgery. Early is defined as current practice within the centre concerned. When delayed surgery is conducted, it will be left to the responsible clinical team (including the parents) to decide but delay should be as long as possible after the early timing but not beyond (for example) one year of age.

Eligibility
Each group is to make its own choice but it is likely to include the ranges of non-syndromic cleft infants such as those covered by the four trials considered in this review. If lip repair is also required, early palate surgery would be conducted as soon as is practical after lip surgery.

Surgical techniques
Each group is to make its own choice of the surgical technique(s) to use. This could be, for example, a randomised option between two surgical approaches.

Clinical endpoint
VPC assessed using a standard approach (such as by Lohmander and colleagues 15 ) but recording and reporting of the individual variable scores rather than merely the transfer values suggested.
VPC to be assessed as close as possible (± four weeks) to the birthdays at three and five years.

Data exchange
To monitor overall progress of the multi-group trial, groups would send their anonymised data annually (at a fixed date) to be checked for completeness by the coordinating centre, which would need to be identified.

Meta-analysis
Although the minimum duration of recruitment to this trial may be set at six years, interim analysis of VPC at age three years could be published after (for example) five years from commencement of the international collaboration, provided sufficient data have been accumulated. Table 5: Proposed structure of a pragmatic trial to compare surgical timings with VPC as the primary concern and assessed by means of an ordered categorical variable VPC=velopharyngeal competence.
Task Force recommended 'that a prospective international controlled trial is conducted'. 18 In the knowledge that there are many individual centres and collaborative groups capable of recruiting substantial numbers of infants with cleft palate/lip anomalies, we propose a pragmatic way forward.
Rather than a single trial, we propose that groups capable of recruiting, for example, 50 eligible infants within a framework of two years, each conduct their own RCT with a view to a future international collaboration to organise a prospective, individual data meta-analysis of these many trials. The possible framework of such a collaboration is summarised in Table 5 which allows individual centres to make specific choices concerning the eligibility of the infants to be included and the surgical options, but with some provisos imposed by the overall design such as an agreed endpoint and how it is to be assessed.
The proposal envisages that the ongoing timing of primary surgery (TOPS) for cleft palate trial 19 by Shaw and colleagues would eventually form part of the prospective meta-analysis. Their trial relates to non-syndromic isolated cleft palate participants who have received the Sommerlad surgical technique 20 either at six or 12 months. The main outcome variable is VPC at five years but also includes an assessment at three years. Their trial is closed for recruitment with the final follow-up assessments due in July 2020.
Although this article has been written in the context of the unresolved question with regard to surgical timings, the general structure of the proposal allows for a similar approach to be adapted to accommodate other unanswered aspects of cleft management which require RCTs to be conducted. As Bekisz and colleagues conclude, following their review of RCTs in cleft and craniofacial surgery, 'Our community should consider methods by which more RCTs can be performed'. 21

Conclusion
The objective of this review article was to review the available evidence from RCTs concerning the age at which palatal repair is best conducted in infants. Our review suggests no firm conclusions can yet be drawn with respect to the rates of later VPC. As a consequence, we outline the structure of a pragmatic RCT as a basis for further investigation of the optimal of age at surgery (or other relevant research questions).