Vocal fold oscillation pattern changes related to loudness in patients with vocal fold mass lesions

Introduction Vocal fold mass lesions can affect vocal fold oscillation patterns and therefore voice production. It has been previously observed that perturbation values from audio signals were lower with increased loudness. However, how much the oscillation patterns change with gradual alteration of loudness is not yet fully understood. Material and methods Eight patients with vocal fold mass lesions were asked to perform a glide from minimum to maximum loudness on the vowel /i/, ƒo of 125 Hz for male or 250 Hz for female voices. During phonation the subjects were simultaneously recorded with transnasal high speed videoendoscopy (HSV, 20,000 fps), electroglottography (EGG), and an audio recording. Based on the HSV material the Glottal Area Waveform (GAW) was segmented and GAW parameters were computed. Results The greatest vocal fold irregularities were observed at different values between minimum and maximum sound pressure level. There was a relevant discrepancy between the HSV and EGG derived open quotients. Furthermore, the EGG derived sample entropy and GAW values also evidenced different behavior. Conclusions The amount of vocal fold irregularity changes with varying loudness. Therefore, any evaluation of the voice should be performed under different loudness conditions. The discrepancy between EGG and GAW values appears to be much stronger in patients with vocal fold mass lesions than those with normal physiological conditions. Level of evidence 4.


Introduction
Vocal fold mass lesions are a main cause of dysphonia [1] and as such many histopathological findings such as polyps, nodes, cysts or oedemas frequently need medical therapy [1]. In some cases, traditional treatment such as pharmacotherapeutical approaches or voice therapy might be considered helpful. For others, however, phonomicrosurgery is often recommended [1].
Vocal fold mass lesions might induce changes to vocal fold stiffness and mass, which alter the oscillatory eigenmode and spatiotemporal regularity [2]. The consequent entrainment of both vocal fold oscillation patterns, which is influenced mainly by vertical vocal fold deflection [3], might be impaired, resulting in a disturbed structure of glottal air pulse generation. Furthermore, asymmetries might arise which influence the strength of the intraglottal vortices and, in turn, vocal efficiency [4]. In addition, some vocal fold mass lesions might block the closure of the membranous part of the vocal folds, resulting in persistent gaps and high glottal area waveform derived open quotients, which cause increased transglottic air flow, even during the most closed phase. On the one hand, this increases noise, and on the other hand, decreases the intensity of the voice source overtones due to the less abrupt interruption of the airflow [5][6][7]. Although vocal fold mass lesions might frequently cause dysphonia [8], not all mass lesions are necessarily associated with voice disorders. Some entities, such as swellings on the free edge of the vocal foldfrequently categorised as nodesmight develop as a consequence of vocal overuse, but do not necessarily result in dysphonic voice [9]. Neither do such swellings necessarily influence vocal fold oscillation patterns nor voice source production and are sometimes denoted as "functional" [9]. Such swellings have been observed in many professional singers without any impairment of vocal function [10,11]. Thus, as far as there is no suspicion that these swellings are malignant, any indication for surgery should be based on functional aspects rather than on the visual mass lesion itself.
The impairment of vocal function stemming from mass lesions is sometimes not easy to detect because the voiceapart from any evaluation of rough or breathy vocal qualitycan be evaluated using a number of different dimensions of vocal capacity [12,13]. Besides vocal loading capacity, the dimensions of fundamental frequency (ƒ o ) range and dynamic range have been considered important and are established elements of the voice range profile [14]. Concerning the ƒ o range, voice production should not be considered as an homogenous entity. At some points in the ƒ o range, biomechanical properties change abruptly leading to changes in vocal quality [15,16]. Such circumstances can contribute to the definition of vocal registers [17]. Registration events usually occur, according to different biodynamics, in critical regions. Therefore, vocal fold mass lesions frequently impair voice production to a larger extent than the usual speaking voice ƒ o range, i.e. the modal or chest register [18].
Because of the changes in vocal fold stiffness and mass, it can be speculated that oscillation patterns would change, not only with regard to the ƒ o range, but also under different loudness conditions. In this context, it has been shown that the phonation threshold pressure increased in patients with vocal fold mass lesions and decreased after phonomicrosurgery [19,20]. However, greater loudness could itself have an effect on vocal fold oscillation patterns. For healthy voices, increasing loudness is associated with greater maximum flow declination rate [7], which depends on the maximum glottal area declination rate and skewing of the glottal area function [21]. It could be assumed that longer duration of collision results in better entrainment of the oscillating systems leading to stabilization of the voice source. However, such stabilization does not appear only in healthy voices. It has been shown by Brockmann-Bauser et al. that jitter values decreased with increasing loudness in patients with vocal fold mass lesions [22]. The influence of different loudness conditions on vocal fold oscillation patterns in patients with vocal fold mass lesions has, however, not yet been clarified.
This study aims to analyze the effect of gradual changes in vocal loudness on vocal fold oscillation patterns. Consistent with the quoted studies, it was hypothesized that (1)

Material and methods
After approval from the local ethical committee (Medical Ethics Committee of the University of Munich, 18/769), eight adult patients were included in the study. In order to achieve the greatest contrast of the two vocal folds, patients with unilateral predominant vocal fold mass lesions were involved. Only mass lesions were included in which an extension to the epithelium and superficial lamina propria was expected. Non-surgical therapy (i.e. voice therapy and/or pharmacotherapy) was considered not helpful for all these patients, after multidimensional voice evaluation was undertaken by an experienced phoniatrician, and consequently, phonomicrosurgery was recommended. This criterion was chosen because, one the one hand, it indicates that the mass lesion was accompanied by a dysphony and, on the other hand, could offer data if a non-surgical therapy couldin contrast to the expectation given by the decision for surgerybe meaningful. Table 1 shows age, gender, pathology, Voice Handicap Index (VHI) in the German translation [24] and the Dysphonia Severity Index (DSI) [23]. Fig. 1 displays laryngoscopic images for each subject.
The subjects were asked to perform, on the vowel /i/, with a ƒ o of approximately 250 Hz for the female and 125 Hz for the male voices, an increase of vocal loudness from softest to loudest. During phonation the subjects were simultaneously recorded with transnasal high speed videoendoscopy (HSV), electroglottography and audio recording.
In a similar manner to previous investigations [25,26] high-speed videoendoscopy (HSV) (Fastcam SA-X2; Photron, Tokyo, Japan) was performed using transnasal endoscopy using a flexible endoscope (ENF GP; Fa. Olympus, Hamburg, Germany) with a frame rate of 20, 000 frames per second and a spatial resolution of 386 × 320 pixels. Simultaneous to the HSV recording, the audio signal was recorded using a IMK SC 4061 microphone (DPA microphones, Alleroed, Denmark) or Sennheiser ME 62 microphone (Sennheiser, Wedemark, Germany) and electroglottographic (EGG) signals (EG2-PCX2; Glottal Enterprises, Syracuse, NY) were captured. No anesthetic medication was applied for the transnasal endoscopic approach. The audio recording was calibrated with a sound level meter (Voltcraft, Hong Kong, China) using the Sopran software (Svante Granqvist, Karolinska, Stockholm, Sweden). The HSV videos were post-processed by means of rotation, Fast-Fourier-Treatment in order to remove the comb structure of the endoscope, and cropping as previously [25]described. Calculations of the glottal area waveform (GAW) and phonovibrograms from the HSV films were performed as previously described [27,28].
For comparison, the signals were rasterized into 100 ms time windows. Mean values for glottal area derived open quotient (OQ GAW ), electroglottographical open quotient (OQ EGG ), sound pressure level (SPL), Closing Quotient (Closing Phase/Period, CiQ), Speed Quotient (Opening phase/Closing phase, SQ), and fundamental frequency (ƒ o ) were calculated for each window using Multi Signal Analyzer (Schäfer/Schlegel, FAU Erlangen-Nürnberg, Germany), as shown in Table 2.
In order to detect OQ GAW a tolerance threshold of 5% was set, i.e. that the glottis was denoted as open when the GAW signal exceeded 5% from the baseline. The electroglottographic open quotient was calculated according to the Howard criterion [29]. With regards to frequency perturbation, Jitter for all three voice signals (GAW, EGG, and audio) and the Harmonic-to-Noise-Ratio (HNR) from the audio signal were measured.
In order to compare values for a lower and greater SPL for all subjects the same difference in SPL was identified for all subjects in the following way: The minimal SPL increase during the experiment was found in subject 2, with an increase of 6 dB. Therefore, for all subjects the 100 ms window with greatest SPL and the 100 ms window with greatest SPL minus approximately 6 dB (SPL max-6 ) were compared.
The aperiodicity of vocal fold oscillation was found in many subjects at a window in between the minimum (SPL min ) and maximum SPL (SPL max ), and therefore the electroglottographical (EGG) sample entropy [30,31] was used to detect the greatest changes in the EGG signals. In this respect, the window exhibiting the greatest sample entropy was denoted window 0. The 100 ms windows − 2, − 1, 0, + 1, + 2 relative to the window 0 were analysed.
The Pearson correlation test was used, but due to the small sample size comparative statistics were not considered meaningful.

Results
All subjects were able to perform the task with the different loudness conditions. However, the increase of SPL differed among the subjects. The difference between SPL min and SPL max varied from 6 dB (subject 2) to 22 dB (subject 8). Figure 2 shows the trace of SPL, ƒ o , OQ GAW , OQ EGG and the sample entropy for all subjects over the time of the experiment recording. In subject 8 for the 100 ms window 6 there was a drop of OQ GAW to zero which was caused by a near total ventricular fold adduction. This window was excluded from later examinations of the SPL max and SPL max-6 and the analysis of windows with regard to the greatest sample entropy. For the 100 ms window exhibiting SPL max , GAW related measures (OQ GAW , SQ, CiQ) showed no large difference to SPL max-6, Fig. 3; in contrast, OQ EGG was greater for SPL max . Jitter GAW showed greater values for SPL max whereas Jitter Audio and Jitter EGG showed no large difference to SPL max-6 . The HNR was higher for SPL max in comparison to SPL max-6 . Figure 4 represents phonovibrograms for a 25 ms time interval at the mid-point of the 100 ms windows for SPL max and SPL max-6 , respectively.
The expected ƒ o , i.e. 125 Hz for male and 250 Hz for female voices, was not achieved by many of the subjects. Some subjects (subjects 4, 6 and 8 (increased ƒ o during the experiment), subject 7 (decreased ƒ o during the experiment)) showed greater deviations from the required ƒ o. (Fig. 2). During the experiment, the greatest vocal instability was found between SPL max and SPL min for all but one subject. In the windows where the greatest sample entropy occurred, irregularities of the EGG signal and an increase in OQ EGG were also found (Fig. 5). However, in the same windows, there were no large changes in the GAW; in addition neither OQ GAW nor the Closing Quotient showed large changes in the 0 window in which the EGG based greatest sample entropy occurred.

Discussion
This study analyzed the effect of gradual loudness changes on vocal fold oscillation patterns. In general, for most subjects, the greatest irregularity was not found at the lowest SPL, but in between the minimum and maximum SPL. Consequently, the data presented here were not able to support the general assumption that the voice is generally stabilized with increasing SPL. Finally, there were indeed strong differences between GAW derived and EGG derived measures.
Vocal performance depends heavily on both frequency and dynamic range [1,14]. These vocal dimensions are not only important for non-dysphonic voices but also for subjects with vocal impairments arising from vocal fold mass lesions. It has previously been shown that ƒ o. might affect vocal performance in professional singer subjects with vocal mass lesions [18]. In contrast to the previous study, no professional singers were examined in the present study and this could be considered the main reason why the required ƒ o was frequently not achieved. However, the increase in loudness was found to be  accompanied by an increase in SPL for all of the subjects. It should be noticed, however, that the subjects failed to reach the same dynamic range as they did during the clinical testing of the voice range profile. There are many potential reasons for this. One is that the time of the experiment was limited to a recording time of 9 s, producing 32 GB of HSV data, whereas during the voice range profile it was possible to make many repetitions. Another reason is that the transnasal laryngoscope might have influenced voice production arising from increased tension. The present study hypothesized that regularity of vocal fold oscillations would increase with increasing loudness. In this respect, Brockmann-Bauser et al. [22] observed lower perturbation values derived from audio signals for higher SPL in patients with vocal fold mass lesions as well as in subjects without dysphonia. The data presented here, however, failed to support these findings:  The jitter Audio and jitter EGG were almost unchanged between SPL max and SPL max-6 . Furthermore, for SPL max , jitter GAW was increased. There are many possible influencing factors, which could contribute to the differences between the findings presented here and the observations made by Brockmann-Bauser et al. [22]. One is that as noted previously -the dynamic range was lower during the experiment than in the clinical voice evaluation. Furthermore, the data presented refer to the dynamic range of 6 dB which was the lowest observed difference between the minimum and maximum SPL for subject 2. On the one hand, this provides comparability among the subjects. On the other hand, the difference of 6 dB could be considered too small to exhibit greater differences for patients who exhibited a larger dynamic range. Finally, Brockmann-Bauser et al. [22] analyzed audio signals in female voices, only. In the present study a greater number of additional signals were simultaneously analyzed which prevented a study using a larger number of subjects. Last, in the presented study two subjects (subjects 6 and 8) had a greater rise of ƒ o during the experiment. Using sinusoidal tones, it has been shown before that a rise of ƒ o could be associated with changes of jitter measurements [32]. At least for subject 6 this could in part explain greater jitter values for greater SPL. However, for subject 8 this tendency was present only for the jitter Audio but not for the jitter EGG and jitter GAW . The greatest irregularities were found in between minimum and maximum SPL. With regards to changes in ƒ o previous investigations [18] observed regions, i.e. the passaggio regions, were subjects with vocal fold mass lesions showed greater irregularity of vocal fold oscillations. In the present study, however, there were no clear criteria or regions where irregularity appeared more likely for changes in loudness and the physical value SPL.
HSV derived vocal fold oscillation patterns did not differ greatly between SPL max and SPL max-6 with respect to OQ GAW , SQ and CiQ. Furthermore, as is seen in the phonovibrograms, there was no lateralization effect, i.e. the pathologic vocal fold did not behave differently to the healthy one. It is interesting that in contrast, OQ EGG showed greater values for SPL max . It should be noted that OQ GAW and OQ EGG are not equivalent. OQ GAW is derived from a superior laryngoscopic two-dimensional view, whereas OQ EGG represents the changes in impedance due to the three-dimensional vocal fold contact. it has been shown that, in physiologic voices, the concordance of EGG and GAW signals is greater for the 'decontacting' than for the 'contacting' phase [32]. Furthermore, for OQ GAW lower than .7, the agreement of OQ GAW and OQ EGG is high, but for values above 0.7 this agreement is rather low [26]. The data presented here show that, for patients with vocal fold mass lesions, the disagreement for both OQs is much stronger. It could therefore be speculated that impedance changes show an earlier contact of the vocal folds due to the contact of the mass lesion, although the laryngoscopic closure still reveals open parts alongside the mass lesion.
Consequently, OQ EGG has to be interpreted with caution in patients with vocal fold mass lesions. Furthermore, the EGG based sample entropy was used as a criterion to describe the greatest instability in the vocal fold oscillation patterns. This measure was first introduced by Selamtzis and Ternström for analysis of physiologic voices [30]. It has been shown in non-pathologic voices that registration events can be detected using this measure [31,33]. However, the data presented showed that the GAW derived irregularities behave differently to the EGG derived data in the time domain. Therefore, any doubts are justified as to whether the EGG based sample entropy can be used for voice evaluation in patients with vocal fold mass lesions.
There are many key limitations of this study. The first limitation stems from the variety of different mass lesion entities which are present. In this study patients with polyp, cysts, node and edema were included. Since the histopathology of the Reinke space differs specifically, the effect on stiffness and vocal fold closure could be expected to be varied. However, it should be noted that for most subjects the greatest sample entropy was not found at the limits of the dynamic range. Also in this respect, only patients with an indication for phonomicrosurgery were included. It remains unclear whether results would be comparable in patients with vocal fold mass lesions, but with a lesser impact on vocal function and, therefore, with no indication for surgery. Also in this context, the study included only patients with predominantly unilateral vocal fold mass lesions. It cannot be excluded that bilateral mass lesions would exhibit different results. As previously noted, the patients were not vocally trained and, therefore, they were not able to achieve the ƒ o required in each case. Rising ƒ o is frequently associated with greater SPL [7,17]. Therefore, for subjects exhibiting greater ƒ o changes throughout the experiment, part of the differences observed could be related not only to SPL but also to differences in ƒ o . Different loudness conditions frequently show different vocal tract shapes [34]; as such vocal tract/voice source interactions [35][36][37] could have influenced the observed vocal fold irregularities in different ways. Also in this respect, SPL max and SPL max-6 were used in to compare differences for the various measures. The reason to not use the minimal SPL was that the minimum SPL was frequently found in the voice onset, and that could have a greater impact on the GAW related measures. Furthermore, the signal to noise ratio is lower for lower SPL. However, it cannot be ignored that softer loudness might exhibit a different sensitivity to the measures used.
A further important limitation is that the increase in loudness was not standardized, i.e. the increase in loudness had to be performed over a specific time interval. It could be assumed that coordination and stabilization of the voice might be easier over a longer duration, and therefore would exhibit smaller irregularity. How much the different durations in such experiments influence any irregularity should be analyzed in future investigations. Furthermore, due to the extended recording and analysis setup only eight subjects could be included in this study, which prevented any statistical analysis. It is hoped that greater numbers of subjects can be included in future investigations in order to statistically verify any observed tendencies.

Conclusions
The amount of vocal fold irregularity changes with varying loudness. Therefore, an evaluation of voice under different loudness conditions should be recommended in patients with vocal fold mass lesions. With respect to perturbation values, this study failed to verify lower jitter values for greater SPL. The measures from electroglottographic signals and glottal area waveform differedand therefore OQto a larger extent in patients with vocal fold mass lesions compared to physiologic voices.