Vol 1 (2015)

Normal and Abnormal Vocal Folds Kinematics

High-Speed Digital Phonoscopy (HSDP), Optical Coherence Tomography (OCT) & Narrow Band Imaging (NBI®)

Volume I: Technology

Krzysztof Izdebski, Yuling Yan, Ronald R. Ward, Brian J.F. Wong & Raul M. Cruz

PUBLISHED: 2015-05-11


Krzysztof Izdebski & Amy M. Matson

Harry Hollien & W.S. Brown, Jr.

Hans von Leden

Abstract: In this chapter, Hans discusses with Amy M. Matson the path that led him and Paul Moore to utilize the ultra high-speed camera to study the laryngeal actions.

Amy M. Matson & Krzysztof Izdebski

Abstract: This is an interview we conducted with Hans von Leden on June 28, 2013. The purpose was to highlight the ideas behind initiating this research more than 50 years ago.

Krzysztof Izdebski

Abstract: Evolution of high-speed recording technology of the vocal folds is discussed here. Also the nature of the nomenclature used to describe this process is reviewed and the term High Speed Digital Phonoscopy (HSDP) is proposed to be used when referring to visualization of the glottis with high speed technology. The two other terms commonly used for this process are High Speed Digital Imaging (HSDI) and High Speed Video (HSV).

Keywords: HSDP, HSDI, HSV, vocal folds, UHSC, VKG, SVKG, VSK

Harry Hollien

Abstract: This chapter is designed to provide certain baseline information and perspectives for how physiological structure of the vocal folds (VF) was addressed in the literature. This information will take the form of 1) a brief history of some of the concepts and relationships which have led to modern HSDI research, 2) a short review of phonatory theory, and 3) baseline data for a model of VF activity and laryngeal functions. In turn, the model can be used to assist the HSDI investigators in interpreting their observations and findings. The review will start with data/concepts from the 20th century. It will do so because much of the baseline research to be discussed was carried out during that period.

Keywords: VF, neurochronaxic, myoelastic and aerodynamic theories, lateral soft-tissue x-rays, VF length, VF thickness, laminagraphy, Stroboscopic Laminagraphy (STROL), F0, Ps


Krzysztof Izdebski

Abstract: This chapter describes KayPENTAX Model 9710 system used by this author in the clinical setting and for research purposes. All HSDP data presented in this publication authored by our group (PVSF) were obtained using this system.

Keywords: CHSV 9710 KayPENTAX, analysis, problems, advantages

Mette Pedersen, Martin Eeg, Anders Jønsson & Sanila Mamood

Abstract: We report on our experience of using Wolf Ltd. HRES Endocam 5562 analytic system for high speed recordings of the vocal folds (VF) in our clinical setting. The system uses high-speed videophotography and is able to capture 4000 or more images per second. It is therefore superior to clinical laryngo-stroboscopy in many areas of voice diagnostics. Such fast resolution of digital high-speed videophotography makes it possible to observe variability in individual VF kinetics and to observe the workings of the VF in pathological phonation. Post-recording analyses are performed using built-in software systems combined with kymography and electroglottography. Future development includes calculations of the lateral features and cycle-based features as well as phonovibrograms of the right and left vocal cord, from cycle to cycle, reproduced online for further software calculations.

Keywords: High Speed Digital Imaging (HSDI), High speed films, vocal folds, quantitative voice measures, EGG, kymograms, phonovibrograms

Hans Larsson & Stellan Hertegård

Abstract: We describe here a less costly HSDI/HSDP system that can be constructed using commercially available components.

Keywords: HSC, HSDC, HSDI, HSDP, stroboscopy, videostroboscopy, kymography, low cost system

Krzysztof Izdebski

Abstract: Advantages of HSDP over more conventional technology (stroboscopy) were already shown by Moore and von Leden in mid 1950s. Since this present publication elaborates on HSDI advantages in great detail, we only highlight here what HSDP adds to the phonatory function studies (PhFS), specifically in the context of other visual exams of the glottis, such as laryngovideostroboscopy (LVS), or the newly introduced high-definition LVS (HDLVS). Additional chapters in this publication address the advantages of HSDP in detailed ways.

Keywords: HSDI, HSDP, LVS, HDLVS, VKG, analysis

Harm K. Schutte & Frits F.M. de Mul

Abstract: In this chapter we summarize current research on videokymography (VKG) in addition to our investigations conducted at the Groningen Voice Research Laboratory in the Netherlands, between 2003 and 2008. This work led to the development of a new generation videokymographic camera allowing both observation of the glottis as well as obtaining a simultaneous videokymographic image. This new system is now fully suitable to be used in voice clinics. In addition to the technical aspects, we also conducted research focused on the anal- ysis of the VKG images. The goal was to provide statistical data pertaining to the vibration pattern of the vocal folds (VF). For the analysis of the images two different approaches have been applied: 1) evaluation on two sinusoids and 2) evaluation on four sinusoids. A patented use of the videokymograph was developed to make a choice from which VF (left or right) a stroboscopic trigger is taken. The line imaging principle we applied also lead to the development of depth-kymog- raphy (DK), in which the vertical movements of the complex vibration pattern of the VF are measured and displayed. In a simulation software program that we developed, the data obtained from both types of kymography are brought together leading to a new type of in situ registration of the vibratory pattern of the VF.

Keywords: kymography, vocal folds, voice, technology, depth-kymography

Daryush D. Mehta, Dimitar D. Deliyski, Steven M. Zeitels, Matías Zañartu & Robert E. Hillman

Abstract: This chapter reports on the development of a system that integrates the capture of laryngeal high-speed videoendoscopy (HSV) through a transnasal fiberoptic endoscope with the synchronous acquisition of multi-sensor recordings of vocal function. Laryngeal HSV is achieved by the transnasal placement of a flexible fiberoptic endoscope with its eyepiece coupled to a monochromatic high-speed video camera and distal end passed through a specially-modified pneumotachograph mask. The setup includes the simultaneous acquisition of signals from a microphone, electroglottograph, accelerometer, and transducers for intraoral air pressure and airflow. Example data illustrate the ability of the transnasal HSV system to synchronously record measures of multiple vocal function parameters from speakers with and without voice disorders. Key features of the digital high-speed camera include an output signal that provides accurate time synchronization and enhanced light sensitivity to capture monochromatic video at rates over 4000 images per second. Transnasal HSV imaging can be combined with other measures of vocal function to significantly expand the potential for comprehensive investigations into phonatory mechanisms during more natural speech production tasks, particularly with respect to the role and impact of aerodynamic forces.

Keywords: HSV, transnasal approach, fiberoptics, pneumotachograph mask, aerodynamics, vocal folds

Beata Miaśkiewicz

Abstract: Advantages and shortcomings of a laryngovideo-stroboscopy system by XION GmbH, Germany that incorporates kymographic and electroglottographic analysis and simultaneous acoustic analysis are presented. Our evaluation of this system is based on our continuous use of the EndoSTROB DX system by XION GmbH for over four years.

Keywords: LVS, kymography, EGG, voice exam, XION EndoSTROB DX system


Krzysztof Izdebski & Yuling Yan

Abstract: Here we present a synopsis of systems used to analyze and/or to process massive HSDP recordings. This synopsis explains why HSDP and HSDP-related analytical programs make HSDP such a powerful tool in vocalization research and in the clinic. The reader is encouraged to read the specific chapters in this publication explaining these methods in depth.

Keywords: HSDP analysis, kymography, spatio-temporal resolution, glottal area waveform, phonovibrography, wavegrams, adaptive thresholding approach, region of interest, laryngotopography, FFT and point FFT analysis, snake fitting approach

Yuling Yan & Krzysztof Izdebski

Abstract: We describe below the development and application of new approaches for quantitative spatio-temporal analyses of vocal fold (VF) vibrations derived from high-speed digital images/phonoscopy (HSDP) of the glottis. Specifically, we describe the analysis of HSDI–derived glottal area waveform (GAW) using the analytic signal method and the Nyquist plot [1-5]. We also discuss the time-frequency analysis of the waveforms including both the GAW and glottal width function (GWF) or bilateral VF displacements. Together these define new quantitative parameters to provide a detailed characterization of the frequency and the symmetry/homogeneity of the vibration of the VF. Lastly, we introduce a new approach to characterize VF dynamics in normal voice production and in specific voice disorders involving global analysis of HSDI and simultaneously acquired acoustic data. This approach exploits unique clinical information that strengthens diagnostics. Hence, we term this analysis High-Speed Digital Phonoscopy (HSDP). These findings are crucial for a detailed spatio-temporal analysis of glottal source dynamics that are based on visual information and also of the acoustic signals that are generated through the interactions of the sound wave within the vocal tract. The information derived from this combined analysis can be used to compare and correlate quantitative measures obtained from acoustic and image-based analysis for assessment of voice conditions across the lifespan or in analysis of various voice pathologies.

Keywords: HSDP, GAW, Nyquist plots, LVS, HSV, HSLI, HGG, VK, DAS, DIH

Akihito Yamauchi, Hiroshi Imagawa, Ken-Ichi Sakakibara, Hisayuki Yokonishi, Takaharu Nito, Tatsuya Yamasoba & Niro Tayama

Abstract: Laryngotopography is a two-dimensional analyzing technique for HSDI/HSDP to intuitively grasp spatial characteristics of vocal fold (VF) vibrations by pixel-wise discrete Fourier transform for brightness curve. The analyzing process involves four steps: 1) a selection of rectangular area for analysis, 2) extraction of a time-varying raw brightness curve of each pixel and normalization by subtracting the average brightness level of the consecutive 512 frames, 3) application of the Humming window, and finally (4) an application of discrete Fourier transform. This method provides valuable information on: 1) topographic fundamental frequency (F0top), 2) longitudinal phase difference (phase difference in the anterior-posterior direction), and 3) lateral phase differences (phase difference in the left-right direction).

Keywords: VF, HSDI/HSDP, Laryngotopography, rectangular area for analysis, extraction of a time-varying raw brightness curve, application of the Humming window, application of discrete Fourier transform, topographic fundamental frequency (F0top), longitudinal phase difference, lateral phase difference

Matthew Blanco, Xin Chen & Yuling Yan

Abstract: An adaptive threshold approach is proposed for the segmentation of glottal images acquired from high-speed phonoscopy (HSP). In this approach, difference image sequences are used to identify a region of interest (ROI) that encloses the maximal vocal-fold motion. Then, sub-image sequences are defined from the identified ROI from each original image frame. Finally, threshold segmentation is performed on the sub-image sequences with variable threshold value adapted for each sub-image frame. The proposed approach has been shown effective for segmenting a vast amount of image frames from clinical HSP recordings representing both normal and pathological voice conditions.

Keywords: Segmentation, Glottis, Vocal fold motion, Difference image, Adaptive threshold

Yuling Yan, Tao Jiang & Shouhua Luo

Abstract: High-speed digital videoendoscopy provides a direct means to capture the actual vocal fold vibrations and is emerging as a new clinical tool for voice assessment. The system can acquire images of the vibrating vocal folds with simultaneous recording of voice data from the patient. The laryngeal image-based analysis has been proven valuable for objective and quantitative assessment of voice kinematics in health and disease. Meanwhile, acoustic analysis of voice data could assist in the study of phonatory characteristics and reveal useful information related to laryngeal pathophysiology. Contrast to the hardware acquisition systems, the development of effective software for handling such massive visual/sound data has lagged behind. In this chapter, a software system is designed to process the laryngeal image sequences and perform image-based analyses as well as acoustic analyses. Our software (Vocalizer®) contains the following modules: (1) Import and view Module – to read AVI video data and sound data (wave file), edit/compile and save selected data, make image montages using DirectShow technology, and display the acoustic waveform using DirectSound technology; (2) Image Process Module – to perform frame-by-frame image segmentation to delineate the glottis and extract the GAW and bilateral vocal fold displacements; (3) Image Analysis Module – to adopt Nyquist plot displays that involve the Hilbert transform based analysis of GAW and provide instantaneous frequency and amplitude distributions; (4) Acoustic Analysis Module – to perform Fast Fourier Transform (FFT) and Spectrogram analyses of the imported sound data, display the plot of the sound data, and provide instantaneous frequency and amplitude distributions and Nyquist plots; and (5) Dual GAW and sound wave display module. Upon rigorous testing of this software using clinical data samples, we demonstrate the applications of the software to the study of dynamic characteristics of the glottis, which may correlate with voice quality and health condition.

Keywords: High-speed video-endoscopy, high-speed digital imaging, vocal fold vibration, glottal area waveform, acoustic analysis, Nyquist plot, FFT, spectrogram, software, DirectShow, DirectSound

Yuling Yan & Gan Du

Abstract: High-speed digital imaging (HSDI) of the glottis provides important information on key dynamic events associated with the production of human voice. However, the large amount of data accumulated presents significant challenges for image processing that allows for subsequent image-based quantitative analysis of vocal fold (VF) motion. In this chapter, we present an active contour (snake) based approach for automatic delineation of the glottis. This is an essential step toward extraction of the VF displacements and glottal area waveform (GAW). The approach involves three sequential processing steps: 1) a coarse segmentation by global thresholding, followed by detection of an ellipse that approximates the glottis geometry; 2) an estimation of the parameters of the ellipse using principal component analysis (PCA); and 3) application of the snake-based method, where the detected ellipse is used as the initial snake contour, while the iteration time is determined in an adaptive manner based upon the ellipse fitting error. The algorithm is highly efficient and facilitates frame-by-frame processing of the massive HSDI data. The value of this approach is demonstrated by using several representative clinical samples of the HSDI recordings obtained from subjects having both normal and pathological voice conditions. Finally, comparative analyses are performed using the proposed method and three existing methods with regard to computational efficiency and segmentation accuracy.

Keywords: Active contour, snake, vocal fold motion, glottis, principle component analysis

Jan G. Švec

Abstract: The chapter reviews the ideas and approaches for displaying and studying the vocal fold vibrations in vivo from a historical perspective. Starting with the invention of stroboscopy and laryngoscopy in the 19th century, the search for understanding the vocal fold vibration led to applying high-speed cinematography in 1930s and to manual tracing of the edges of the vibrating vocal folds frame by frame in 1950s. Electroglottography and photoglottography were developed in 1950s to visualize the vocal fold vibrations more simply. Photokymography was introduced in 1970s and high-speed digital imaging has replaced high-speed cinematography in 1980s. Videokymography, introduced in 1990s, became the first high-speed imaging technique routinely applicable in clinical practice. Inspired by videokymography, kymographic software was then created for extracting kymograms from high-speed digital image recordings and videostropboscopic recordings. The boom of digital image processing methods currently allows analyzing the vocal fold vibratory patterns in a highly sophisticated way.

Keywords: laryngostroboscopy, high-speed cinematography, electroglottography (EGG), photoglottography (PGG), photokymography, videokymography (VKG), high-speed digital imaging (HSDI), kymography

Marcin Just, Michał H. Tyc & Ewa Niebudek-Bogusz

Abstract: We present here our approach of processing vocal fold (VF) imaging with high-speed digital phonoscopy (HSDP) and laryngovideostroboscopy (LVS). To achieve these goals, our system automatically generates kymographic sections of VF activity from selected video sequences and then constructs phonovibrograms.

Keywords: LVS, HSDP, phonovibrograms, cropping, corrections, software for processing images


Krzysztof Izdebski & Brian J.F. Wong

Abstract: Optical Coherence Tomography (OCT) is an optical based technology that uses light to generate cross-sectional images of turbid media such as living tissue. In medicine and biology, OCT can provide images to a depth of about 1 mm, and does not expose patients to the risks of ionizing radiation. There are several key organs where detailed microanatomic information of surface structure is extremely important and this includes the retina, coronary vasculature, and of course the delicate mucosa of the upper aerodigestive tract, in particular the vocal folds.

Keywords: OCT, mucosa, vocal folds, malignant changes


Krzysztof Izdebski & Brian J.F. Wong

Abstract: Narrow band imaging (NBI®) refers to an optical imaging technique used in endoscopy, where special filter is electronically activated to split the white light into blue (415 nm) and green (540 nm) wavelengths to enhance the details of mucosa that contains hemoglobin. Because the peak light absorption of hemoglobin occurs at these wavelengths, superficial blood vessels will appear very dark and deeper and will be seen as cyan in color. This illumination leads to an increase in the recognition of mucosal lesions that that are fed by blood vessels, such as carcinoma and papillomatosis. However, other conditions that cause hyper-vascularization such as chronic laryngitis or vocal phono-trauma are also thought to respond differentially to NBI®, and hence NBI® illumination may lead to improved diagnoses. Alternative methods to improve visualization of the mucosa in endoscopy include chromoendoscopy, confocal microscopy, and optical coherence tomography—subjects of many other chapters in this publication.

Keywords: white light, narrow band imaging (NBI®), light wave-lengths, diagnosis, hemoglobin, vocal folds, cancer, papillomatosis, phonotrauma, angiogenesis, new technology, Olympus, HD TV

Krzysztof Izdebski & Raul M. Cruz

Abstract: This chapter describes practical working aspects of the Olympus Tower (CV-190/CLV-190) system that includes NBI® light filtering technology.

Keywords: NBI®, larynx, operations, problems, advantages

Giorgio Peretti, Renzo Mora, Cesare Piazza & Francesco Mora

Abstract: Several technological improvements have been introduced to obtain an optical biopsy of the laryngeal suspicious lesions. Among these, Narrow Band Imaging (NBI®, Olympus Medical System Corporation, Tokyo, Japan) is a new imaging technique for visualization of tumor-specific neoangiogenesis. The use of NBI® is currently considered of substantial benefit in detecting superficial mucosal laryngeal lesions as it provides better detection of irregular microvascular patterns of pre-malignant and malignant lesions compared to conventional WL. For these reasons, different classifications of intraepithelial papillary capillary loop (IPCL) features have been proposed to facilitate the prediction of laryngeal cancer and/or precancerous lesions, and the most comprehensive classification to date has been formulated by Ni et al. The main advantage of NBI® is that it can easily detect and distinguish malignant and premalignant tumors from benign lesions that are not detectable with more traditional endoscopic procedures. Many authors have shown how NBI® plays different roles in the management of laryngeal cancer during preoperative diagnostic work up, in the intraoperative setting, and during post-treatment (surgery, radiotherapy, or chemoradiotherapy) follow-up.

Keywords: SCC, NBI®, WL, biologic endoscopy, neoangiogenesis, accuracy, Dx, intraepithelial papillary capillary loop (IPCL)