Validation of pharyngeal findings on sleep nasopharyngoscopy in children with snoring/sleep disordered breathing
Journal of Otolaryngology - Head & Neck Surgery volume 43, Article number: 13 (2014)
To validate the pharyngeal findings in sleep nasopharyngoscopy (SNP) of children with snoring - sleep disordered breathing (S-SDB).
Prospective agreement diagnostic study on retrospective data.
We conducted an inter-and intra-rater agreement study on video documentations of SNP performed on children (non-syndromic, complex, or operated upon) who presented with S-SDB. The videos featured various pharyngeal findings (normal, collapse, mixed or obstruction). Three ‘non-expert’ raters at various stages in their otolaryngological careers rated the videos independently, and on two separate occasions following an instructional session. We calculated both weighted and non-weighted linear kappa.
Each independent observer rated sixty-one videos (2 weeks apart). Intra-observer agreement was 0.64 ± 0.08 (95% CI 0.48-0.81), 0.74 ± 0.07 (95% CI 0.60-0.88), 0.59 ± 0.08 (95% CI 0.43-0.74), for raters 1, two and three. Weighted kappa was 0.6 ± 0.1 (95% CI 0.41-0.79), 0.8 ± 0.06 (95% CI 0.7-0.92), 0.7 ± 0.07 (95% CI 0.57-0.83), respectively. Inter-rater agreements between raters one and two, two and three, three and four were 0.83 ± 0.06 (95% CI 0.71-0.95), 0.52 ± 0.08 (95% CI 0.36-0.70), and 0.53 ± 0.08 (95% CI 0.37-0.69), respectively. Weighted kappa was 0.83 ± 0.073 (95% CI 0.69-0.98), 0.68 ± 0.06 (95% CI 0.56-0.79), and 0.64 ± 0.07 (95% CI 0.49-0.78), respectively.
This is the first validation of pharyngeal findings on SNP in children. It is based on a four types’ classification. Overall reproducibility amongst the three raters and their agreement was moderate to good. Further work should be phase four trials investigating the impact on outcome.
Sleep disordered breathing (SDB) is commonly diagnosed in the pediatric population. It is defined as a disorder of breathing characterized by ‘prolonged partial upper airway obstruction and/or intermittent complete obstruction that disrupts normal ventilation during sleep’ . Obstructive sleep apnea (OSA), the most severe category of SDB, affects approximately 1-4% of all children. If left untreated, OSA can lead to significant impairment in quality of life and physical health sequelae .
Tonsillar and adenoid hypertrophy have been recognized as commonest obstructive pathology that lead to pediatric OSA, and as a result adenotonsillectomy (T&A) is recommended as a first line surgical treatment. However, complete resolution of symptoms after this intervention is infrequent, as estimated in a recent meta-analysis (66.3%). Furthermore, residual disease has been noted to be more prevalent in obese children [3–5].
Sleep endoscopy or nasopharyngoscopy (SNP) has recently gained interest among pediatric otolaryngologists for its potential to identify anatomical sites amenable to surgical correction. Durr et al. evaluated post-operative T&A patients with residual symptoms, using drug-induced sleep endoscopy. As expected, their study revealed multi-level obstruction along the upper airway related to the tongue base, adenoid re-growth and inferior turbinate hypertrophy . Although this study used a standardized, site specific scale to assess SNP findings, it has not been accepted nor validated in children. Myatt and Beckenham were the first to describe a specific scoring for levels of obstruction using SNP. They described 4 levels of obstruction, namely the velopharynx, tonsils/lateral pharyngeal wall, tongue base and supraglottis .
In the adult literature, two studies conducted by the same research team, evaluated test-retest reliability and inter-rater reliability when using SNP in patients with SDB. The authors found that their intra-rater and inter-rater agreement, on both studies ranged from moderate to substantial. However, their population was heterogeneous with a predominance of abnormal findings, and the assessors were both experienced sleep surgeons [8, 9].
The aims of our study are (a) to introduce a specific scoring system to evaluate the pharyngeal findings of SNP in patients with SDB, (b) to validate this scoring system using three raters of unequal experience, who are not experts on SNP.
Material and methods
We conducted an intra- and inter-rater agreement study at a tertiary referral center (the Stollery Children’s Hospital, Edmonton, Alberta, Canada) after obtaining approval by the institutional Health Research Ethics Board (Pro00024340). Digital videos of patients undergoing SNP were accessed for this study. The videos had been recorded in a standardized manner, employing a pediatric flexible bronchoscope and collected using a digital 3-chip camera and integrated digital data archiving by the senior author of all patients undergoing SNP since August 2003.
SNP was used in all children (<18 years of age) who presented with new or recurrent symptoms of S-SDB, and were potential candidates for surgical management or required exclusion of that option prior to minimally invasive ventilation. The children presented with persistent snoring (witnessed by their parents or care givers for ≥12 months on a nightly basis). Children also presented with other nocturnal and diurnal symptoms. The senior author’s practice uses a modified version of the Pediatric Sleep Questionnaire . In addition to the standard items, we inquired about risk factors of S-SDB perinatally, atopy and other lung conditions, prior surgery, body weight, developmental history, neuropsychiatric conditions, esophagitis, aspiration, and smoking habits of caregivers. All children were subjected to overnight pulse oximetry. The results are graded according to Nixon et al.. A full polysomnography was reserved in this practice for syndromic children or those with complex medical history, patients whose diagnosis was in doubt, or whose symptoms were not in concordance with sleep oximetry results.
All SNP were performed with a uniform sedation protocol in the operating room, using Remifentanyl 2–2.5 mcg/ml and infusion rates of Propofol varying from 200–350 mcg/kg/min titrated for response to stimulation. The patients were kept spontaneously breathing throughout the assessment. If inhalational induction had been utilized, the endoscopy was done only when end tidal sevofluorane was zero. The nasal mucosa was topicalized with 1% lignocaine (to a maximum of 3 mg/kg body weight). A flexible bronchoscope was used to assess the airway systematically, from the nose to the larynx.
Sixty-one videos were chosen for the study by the senior author. Allowing for an earlier period of growing experience, and utilization of analogue (non-digital) capture equipment, the records of the first 4 years were avoided. The inclusion criteria were: (1) non-edited, high quality recordings (2) representative of one of encountered types of pharyngeal findings (normal , obstruction , collapse , mixed , (3) performed in non-previously operated patients). Aside from ensuring a non skewed proportion of the four types, a random folder of digital videos was chosen from the 6th year (2010), and the videos were included consecutively. There were ultimately thirteen obstruction videos, thirteen collapse, nineteen mixed, and sixteen normal. None of the children whose videos were included were syndromic or complex.
Three “non-expert” raters, at various stages in their otolaryngology career, who had not been involved in the inception of this scoring system, nor do they perform SNP routinely were recruited. Throughout their training they were exposed to SNP for a total of a three-month period. At the time of the experiment, two were starting third and fourth year in residency respectively, and the third had been in staff position for one year after finishing a year of post-graduate clinical fellowship training (head and neck reconstructive and esthetic surgery). The scoring process was explained during an hour-long instructional session. They were blinded to the identity of children, their demographics, clinical details and eventual or prior management. Videos were compiled, coded, and organized into two software presentation documents whose linked videos were de-identified. This process was done by one of us, who was the only one who kept the code to the videos. Each document contained the same videos, but in two different random orders. Each rater was given the 1st set of videos and asked to score them independently. Two weeks later, the rater was given the second document and asked to score the videos.
Each video represented either a normal pharynx, a collapse of the pharyngeal walls affecting over 50% of the cross sectional area during inspiration, an obstruction of the pharynx affecting over 50% of the cross sectional area at both phases of respiration, or a mixed (collapse and obstruction) presentation (Table 1). The raters were required to decide if the type was present or absent. They were not required to rate any nasal, nasopharyngeal or velopharyngeal findings (i.e. started scoring findings from seeing the oro-pharyngeal tonsils, downwards). The main objective was to rate the oro- and hypopharynx as these were the regions deemed most likely affected by pharmacologically-induced sleep.
Kappa statistic was used to measure agreement. Non-weighted kappa was calculated first. We then postulated that since the normal and collapse states do not require pharyngeal surgery (assuming no other variable interferes with the decision), the rater’s scoring should be penalized upon rating them as obstruction or mixed states. As such linear weighted kappa calculation was based on unequal imputed distance (doubled) between the first two categories and the third and fourth. The kappa values, standard errors, maximum possible kappa, proportions and 95% confidence intervals were provided .
A total of 61 videos were analyzed. There were thirteen obstruction videos, thirteen collapse, nineteen mixed, and sixteen normal. The mean duration was 52 ± 26.99 seconds (range 15–180 seconds). The mean age was 7.43 ± 2.37 (4.3-6.25) years. Thirty-one were males. The mean BMI for age and sex was 20.9 ± 2.5 kg/m . Median pulse oximetry grade was 1.
Three raters scored the videos as described in the Materials and methods section. The intra-rater agreement ranged from moderate to good for non-weighted kappa values (Table 2). The values were 0.64 ± 0.078 for rater 1, 0.73 ± 0.071 for rater 2, and 0.58 ± 0.0776 for rater 3. The 95% confidence interval (CI) spanned two categories of agreement (i.e. moderate to good or good to very good). The proportions of agreement were 0.77, 0.82, and 0.69 for raters 1, 2, and 3 respectively. They were all higher than expected for chance alone.
Linear weighted kappa values were slightly higher, and also ranged from moderate to good. The values were 0.60 ± 0.1 for rater 1, 0.80 ± 0.06 for rater 2, and 0.7 ± 0.07 for rater 3, and their 95% CI lower limit were moderate.
Table 3 displays inter-rater agreement. Raters one and two agreement was the highest (very good). The non-weighted kappa was 0.85 ± 0.0569, the weighted value was 0.83 ± 0.07, and the observed proportion of agreement was 0.9 (Table 2). The next two sets of agreements were moderate on non-weighted kappa (0.53 ± 0.08, and 0.53 ± 0.08 for raters 2 & 3, and 1 & 3 respectively). Both improved to good on calculating linear weighted kappa (0.68 ± 0.07, and 0.64 ± 0.07 for raters 2 & 3, and 1 & 3 respectively). Both observed proportions of agreement were similar (0.66) and above that expected by chance.
According to accepted categories of kappa values intra- and inter-rater agreements in this study are good . Generally speaking, by rejecting the null hypothesis (k is not zero, and above 0.5) we are assured that the agreement achieved is above chance, but its interpretation to individual situations will vary .
In this study, we have achieved these results by non-expert raters in order to demonstrate that the method is easy to learn, and reproducible. A conscious attempt was made to test the most contentious of issues related to SNP: the oro- and hypopharyngeal findings. Although the same technique is used on a daily basis for diagnosing and managing dynamic laryngeal conditions, concerns exist that pharmacologically induced sleep would distort the findings. Such issues are not valid for the nose and nasopharynx, where changes might only be affected by posture and use of decongestants . We also sought to cater for one of the most important proposed functions of SNP; seeking surgical targets and avoiding unnecessary operations. By calculating linear weighted kappa, the ratings incurred a heavier penalty upon disagreement where surgery may be useful (normal/collapse versus obstruction/mixed) and not just mis-classification of the mutually exclusive types of finding.
Although the videos used were not recorded for the purpose of the study, the conditions of the endoscopies were standardized, and the design of the experiment was conceived prospectively. Another caveat that we circumvented in this study is spectrum bias . In contrast to other studies (Durr et al., Truong et al.) we have neither included children who were operated upon before nor complex or syndromic children, and a broad range of findings were included [6, 17]. This lends more credibility to the findings, and less room for learning effect and chance agreement.
All the endoscopies were performed while the patients were breathing spontaneously under the same intravenous agents, although it is conceivable that in a full prospective experiment some endoscopies might have been excluded due to protocol deviations. There is some debate, however, regarding the ideal pharmacological agent that would achieve the closest status to physiologic sleep. Current literature suggests that a clinical target of loss of responsiveness can be used to achieve airway conditions that mimic findings seen in normal sleep using either propofol, or midazolam infusion . Further, we have evidence from the literature in favor of propofol based on comparable polysomnography findings to those of physiological sleep , its effect on the genioglossus muscle, critical closing pressure of the pharynx, and its titratable effects [20–28]. The caveat is that these citations are all from adult literature.
One plausible criticism is the conspicuous absence of PSG, the reference standard for the diagnosis. We would argue that we have used a validated score based on pulse oximetry, and the patients were screened with a standard questionnaire. Further, the agreement study in its own right would not have been affected, and correlating the findings to PSG was not our set objective.
There are two further limitations to this work. These are namely the absence of test-re-test reliability, and site specific testing. As to the former, there are ethical considerations that probably would have made that step impossible. These relate to concerns regarding consequences of repeated general anesthesia on the health of children, despite the evidence being controversial . With respect to individual sites (e.g. scoring for laryngomalacia, lingual tonsil enlargement, lateral versus circumferential collapse), this work did not aim beyond testing the agreement on discriminating normal, collapse and obstruction. The latter aspect is an important step towards evaluating this diagnostic tool . Ultimately, after external validation the community should put to the test whether surgery directed by SNP would achieve better results that traditional planning. This could also unravel the reasons for the current success figures of adenotonsillectomy in the treatment of SDB.
The findings of this study have implications for the management of SDB in children. To date, no such validation exercise has been done for assessment of SNP in pediatric patients. Kezirian et al. published two prospective studies assessing test-retest reliability and inter-rater reliability of DISE in adults with SDB. They found both test-retest and inter-rater reliability to be moderate to good [13, 31]. Our work supports the notion that SNP in general is a promising tool in SDB.
We have published, and unpublished data that demonstrate, in one cross-sectional  and three case-controlled studies [27, 33, 34] that SNP findings in children presenting with SDB, and particular risk groups are unique and different from comparison groups. Using comparisons of the proportions of collapse, obstruction and mixed findings the comparison groups demonstrated consistently a predominance of mixed and obstructive findings over that of collapse, whereas the high-risk groups demonstrated more collapse and mixed groups. These may be viewed as phase one trials demonstrating that the findings in of SNP are distinct in high risk groups of SDB .
A final point that would emphasize the potential impact of this practice on changing management decisions comes from the difference in SNP findings and those of traditional clinical examination in the awake child. Upon conducting an agreement analysis on obstructive and non-obstructive finings in our first 248 children, the k was 0.44 (95% CI 0.33-0.55). The clinic findings missed 58 obstructions, and misdiagnosed 13 non-obstructions (considering the SNP as the reference standard). Ostensibly, many useful surgeries would have been missed, and some unnecessary operations performed.
We have demonstrated a moderate to good agreement on a proposed scoring of the pharyngeal findings of SNP in children with snoring/SDB. External validation and phase four trials are recommended for future work.
Marcus CL, Brooks LJ, Draper KJ, Gozal D, Halbower AC, Jones J, Schechter MS, Ward SD, Sheldon SH, Shiffman RN, Lehmann C, Spruyt K, American Academy of Pediatrics: Diagnosis and management of childhood obstructive sleep apnea syndrome. Pediatrics. 2012, 130 (3): e714-e755. 10.1542/peds.2012-1672.
Lumeng JC, Chervin RD: Epidemiology of pediatric obstructive sleep apnea. Proc Am Thorac Soc. 2008, 5 (2): 242-252. 10.1513/pats.200708-135MG.
Mitchell RB, Kelly J: Adenotonsillectomy for obstructive sleep apnea in obese children. Otolaryngol Head Neck Surg. 2004, 131 (1): 104-108. 10.1016/j.otohns.2004.02.024.
Mitchell RB, Kelly J: Outcome of adenotonsillectomy for severe obstructive sleep apnea in children. Int J Pediatr Otorhinolaryngol. 2004, 68 (11): 1375-1379. 10.1016/j.ijporl.2004.04.026.
Friedman M, Wilson M, Lin HC, Chang HW: Updated systematic review of tonsillectomy and adenoidectomy for treatment of pediatric obstructive sleep apnea/hypopnea syndrome. Otolaryngol Head Neck Surg. 2009, 140 (6): 800-808. 10.1016/j.otohns.2009.01.043.
Durr ML, Meyer AK, Kezirian EJ, Rosbe KW: Drug-induced sleep endoscopy in persistent pediatric sleep-disordered breathing after adenotonsillectomy. Arch Otolaryngol Head Neck Surg. 2012, 138 (7): 638-643.
Myatt HM, Beckenham EJ: The use of diagnostic sleep nasoendoscopy in the management of children with complex upper airway obstruction. Clin Otolaryngol Allied Sci. 2000, 25 (3): 200-10.1046/j.1365-2273.2000.00323.x.
Kezirian EJ, White DP, Malhotra A, Ma W, McCulloch CE, Goldberg AN: Interrater reliability of drug induced sleep endoscopy. Arch Otolaryngol Head Neck Surg. 2010, 136 (4): 393-10.1001/archoto.2010.26.
Rodriguez-Bruno K, Goldberg AN, McCulloch C, Kezirian EJ: Test-retest reliability of drug-induced sleep endoscopy. Otolaryngol Head Neck Surg. 2009, 140: 646-651. 10.1016/j.otohns.2009.01.012.
Chervin RD, Hedger K, Dillon JE, Pituch KJ: Pediatric sleep questionnaire (PSQ): validity and reliability of scales for sleep-disordered breathing, snoring, sleepiness, and behavioral problems. Sleep Med. 2000, 1 (1): 21-32. 10.1016/S1389-9457(99)00009-X.
Nixon AS, Kermack GM, Davis M, Manoukian J, Brown KA, Brouillette RT: Planning adenotonsillectomy in children with obstructive sleep apnea: the role of overnight oximetry. Pediatrics. 2004, 113 ((1 Pt 1)): e19-e25.
Lowry R: Kappa As a Measure Of Concordance In Categorical Sorting. http://www.vassarstats.net/kappa.html. Updated 20132013
Landis JR, Koch GG: An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics. 1977, 33 (2): 363-374. 10.2307/2529786.
University of York Department of Health Sciences: Measurement in Health and Disease. Cohen’s Kappa: Percentage Agreement - A Misleading Approach. http://www-users.york.ac.uk/~mb55/msc/clinimet/week4/kappash2. Updated Measurement in Health and Disease. Accessed 03/08, 2013
Parikh S, Coronel M, Lee JJ, Brown SM: Validation of a new grading system for endoscopic examination of adenoid hypertrophy. Otolaryngol Head Neck Surg. 2006, 135: 684-687. 10.1016/j.otohns.2006.05.003.
Knottnerus CW, Muris J: Evidence base of clinical diagnosis: evaluation of diagnostic procedures. Br Med J. 2002, 324: 477-480. 10.1136/bmj.324.7335.477.
Truong MT, Woo VG, Koltai PJ: Sleep endoscopy as a diagnostic tool in pediatric obstructive sleep apnea. Int J Pediatr Otorhinolaryngol. 2012, 76 (5): 722-727. 10.1016/j.ijporl.2012.02.028.
Ramji M, Witmans M, Cave D, El-Hakim H: Sleep nasopharyngoscopy in children with Snoring/Sleep disordered breathing: purpose and validity. Curr Otorhinolaryngol Rep. 2013, 1 (1): 8-15. 10.1007/s40136-012-0006-1.
Rabelo FAW, Braga A, Kupper DS, De Oliveira JA, Lopes FM, de Lima Mattos PL, Barreto SG, Sander HH, Fernandes RM, Valera FC: Propofol-induced sleep: polysomnographic evaluation of patients with obstructive sleep apnea and controls. Otolaryngol Head Neck Surg. 2010, 142 (2): 218-10.1016/j.otohns.2009.11.002.
Eastwood PR, Platt PR, Shepherd K, Maddison K, Hillman DR: Collapsibility of the upper airway at different concentrations of propofol anesthesia. Anesthesiology. 2005, 103 (3): 470-477. 10.1097/00000542-200509000-00007.
Eikermann M, Grosse-Sundrup M, Zaremba S, Henry ME, Bittner EA, Hoffmann U, Chamberlin NL: Ketamine activates breathing and abolishes the coupling between loss of consciousness and upper airway dilator muscle dysfunction. Anesthesiology. 2012, 116 (1): 35-46. 10.1097/ALN.0b013e31823d010a.
Crawford MW, Arrica M, Macgowan CK, Yoo S: Extent and localization of changes in upper airway caliber with varying concentrations of sevoflurane in children. Anesthesiology. 2006, 105 (6): 1147-1152. 10.1097/00000542-200612000-00014.
Litman RS, McDonough JM, Marcus CL, Schwartz AR, Ward DS: Upper airway collapsibility in anesthetized children. Anesth Analg. 2006, 102 (3): 750-754. 10.1213/01.ane.0000197695.24281.df.
Mahmoud M, Gunter J, Donnelly LF, Wang Y, Nick TG, Sadhasivam S: A comparison of dexmedetomidine with propofol for magnetic resonance imaging sleep studies in children. Anesth Analg. 2009, 109: 745-753. 10.1213/ane.0b013e3181adc506.
Chan DK, Truong MT, Koltai PJ: Supraglottoplasty for occult laryngomalacia to improve obstructive sleep apnea syndrome. Arch Otolaryngol Head Neck Surg. 2012, 138 (1): 50-54. 10.1001/archoto.2011.233.
Lin AC, Koltai PJ: Persistent pediatric obstructive sleep apnea and lingual tonsillectomy. Otolaryngol Head Neck Surg. 2009, 141 (1): 81-85. 10.1016/j.otohns.2009.03.011.
Fung E, Cave D, Witmans M, Gan K, El-Hakim H: Postoperative respiratory complications and recovery in obese children following adenotonsillectomy for sleep-disordered breathing: A case–control study. Otolaryngol Head Neck Surg. 2010, 142 (6): 898-905. 10.1016/j.otohns.2010.02.012.
Koutsourelakis I, Saffirudin F, Ravesloot M, Zakynthinos S, de Vries N: Surgery for obstructive sleep apnea: Sleep endoscopy determinants of outcome. Laryngoscope. 2012, 122 (11): 2587-2591. 10.1002/lary.23462.
Loepke A, McGowan F, Soriano SG: CON: the toxic effects of anesthetics in the developing brain: the clinical perspective. Anaesth Analg. 2008, 106 (6): 1664-1669. 10.1213/ane.0b013e3181733ef8.
Sackett DL, Haynes RB: Evidence base of clinical diagnosis: the architecture of diagnostic research. BMJ. 2002, 324 (27336): 539-541.
Kezirian EJ, Hohenhorst W, de Vries N: Drug-induced sleep endoscopy: the VOTE classification. Eur Arch Otorhinolaryngol. 2011, 268 (8): 1233-1236. 10.1007/s00405-011-1633-8.
Thevasagayam M, Rodger K, Cave D, Witmans M, El-Hakim H: Prevalence of laryngomalacia in children presenting with sleep-disordered breathing. Laryngoscope. 2010, 120 (8): 1662-1666. 10.1002/lary.21025.
Fung E, Witmans M, Ghosh M, Cave D, El-Hakim H: Upper airway findings in children with down syndrome on sleep nasopharyngoscopy: case–control study. J Otolaryngol Head Neck Surg. 2012, 41 (2): 138-144.
Lyons M, Cave D, Witmans M, El-Hakim H: Causes of pharyngeal dysfunction associated with early versus late onset sleep disordered breathing in children. In press
The authors declare that they have no competing interests.
MR: put the original protocol, submitted the ethics approval application, helped in creating the final format of the digitized videos, anonymized, and created the random sequence for them. She collected the scoring sheets. She wrote the original draft of the manuscript, and incorporated the revisions. VB: scored the videos, assisted in critical appraisal of design and statistical analysis, revised the manuscript. CCJ: scored the videos, assisted in critical appraisal of design, and revised the manuscript. DWC: scored the videos, assisted in critical appraisal of design and statistical analysis, revised the manuscript. HE: Conceived the idea and put the original protocol, Reviewed and selected the digitized videos. He collated the data from the scoring sheets and performed the statistical analysis and interpretation of data. He revised the original draft of the manuscript, provided the final version. All authors read and approved the final manuscript.
About this article
Cite this article
Ramji, M., Biron, V.L., Jeffery, C.C. et al. Validation of pharyngeal findings on sleep nasopharyngoscopy in children with snoring/sleep disordered breathing. J of Otolaryngol - Head & Neck Surg 43, 13 (2014). https://doi.org/10.1186/1916-0216-43-13