Using Synchronized Audio Mapping to Track and Predict Velar and Pharyngeal Wall Locations during Dynamic MRI Sequences
DOI:
https://doi.org/10.12970/2311-1917.2016.04.01.1Keywords:
Hidden Markov Model, dynamic MRI, velopharyngeal position, computational modeling, Mel-Frequency Cepstral Coefficients model.Abstract
Purpose: The purpose of this study is to demonstrate a novel innovative computational modeling technique to 1) track velar and pharyngeal wall movement from dynamic MRI data and to 2) examine the utility of using recorded participant audio signals to estimate velar and pharyngeal wall movement during a speech task. A series of dynamic MRI data and audio acoustic features were used to develop and inform a Hidden Markov Model (HMM) and Mel-Frequency Cepstral Coefficients (MFCC) model.
Methods: One adult male subject was imaged using a fast-gradient echo Fast Low Angle Shot (FLASH) multi-shot spiral technique to acquire 15.8 frames per second (fps) of the midsagittal image plane during the production of “ansa.” The nasal surface of the velum and the posterior pharyngeal wall was identified and marked using a novel pixel selection method. The error rate was measured by calculating the accumulation error and through visual inspection.
Results: The proposed model traced and animated dynamic articulators during the speech process in real-time with an overall accuracy of 81% considering one pixel threshold. The predicted markers (pixels) segmented the structures of interest in the velopharyngeal area and were able to successfully predict the velar and pharyngeal configurations when provided with the audio signal.
Conclusion: This study demonstrates a novel and innovative approach to tracking dynamic velopharyngeal movements. Discussion of the potential application of a predictive model that relies on audio signals to detect the presence of a velopharyngeal gap is discussed.
References
Hess U, Hanning C, Sader R, et al. Evaluation of velopharyngeal closure in preoperative planning of maxillary advancement. Rontegenpraxis 1996; 49: 25-26.
Sader R, Horch HH, Herzog M, et al. High-frequency videocinematography for the objective imaging of the velopharyngeal closure mechanism in cleft palate patients. Fortschr Kieferorthop 1994; 55(4): 169-175. http://dx.doi.org/10.1007/BF02285407
Skolnick ML, Cohn ER. Videofluoroscopic Studies of Speech in Patients with Cleft Palate. New York: Springer-Verlag 1989. http://dx.doi.org/10.1007/978-1-4613-8874-6
Witzel MA, Stringer DA. Methods of assessing velopharyngeal function. Philadelphia: W. B. Saunders 1990.
Pigott RW. An analysis of the strengths and weaknesses of endoscopic and radiological investigations of the velopharyngeal incompetence based on 20-year experience of simultaneous recording. Br J Plast Surg 2002; 55: 32-35. http://dx.doi.org/10.1054/bjps.2001.3732
Pigott RW, Makepeace AP. Some characteristics of endoscopic and radiological systems used in elaboration of the diagnosis of velopharyngeal incompetence. Br J Plast Surg 1982; 35(1): 19-32. http://dx.doi.org/10.1016/0007-1226(82)90078-9
Sinclair SW, Daviews DM, Bracka A. Comparative reliability of nasal pharyngoscopy and videofluorography in the assessment of velopharyngeal incompetence. Br J Plast Surg 1982; 35(2): 113-117. http://dx.doi.org/10.1016/0007-1226(82)90146-1
Henningsson G, Isberg A. A cineradiographic study of velopharyngeal movements for deviant versus nondeviant articulation. Cleft Palate Craniofac J 1991; 28(1): 115-117. http://dx.doi.org/10.1597/1545-1569(1991)028<0115:ACSOVM>2.3.CO;2
Karnell MP, Ibuki K, Morris HL, Van Demark DR. Reliability of the nasopharyngeal fiberscope (NPF) for assessing velopharyngeal function. Cleft Palate Craniofac J 1983; 20(3): 199-208.
Birch MJ, Sommerland BC, Fenn C, Butterworth M. A study of the measurement errors associated with the analysis of velar movements assessed from lateral videofluoroscopic investigations. Cleft Palate Craniofac J 1999; 36(6): 499-507. http://dx.doi.org/10.1597/1545-1569(1999)036<0499:ASOTME>2.3.CO;2
Havstam C, Lohmander A, Persson C, et al. Evaluation of VPI-assessment with videofluoroscopy and nasoendoscopy. Br J Plast Surg 2005; 58(7): 922-31. http://dx.doi.org/10.1016/j.bjps.2005.02.012
Lam DJ, Starr JR, Perkins JA, et al. A comparison of nasoendoscopy and multiview videofluroscopy in assessing velopharyngeal insufficiency. Otolaryngol Head Neck Surg 2006; 134(3): 394-402. http://dx.doi.org/10.1016/j.otohns.2005.11.028
Kuehn DP, Ettema SL, Goldwasser MS, Barkmeier JC, Wachtel JM. Magnetic resonance imaging in the evaluation of occult submucous cleft palate. Cleft Palate Craniofac J 2001; 38(5): 421-431. http://dx.doi.org/10.1597/1545-1569(2001)038<0421:MRIITE>2.0.CO;2
Perry JL, Sutton BP, Kuehn DP, Gamage JK. Using MRI for assessing velopharyngeal structures and function. Cleft Palate Craniofac J 2014; 51(4): 476-485. http://dx.doi.org/10.1597/12-083
Sutton BP, Conway CA, Bae Y, Seethamraju R, Kuehn DP. Faster dynamic imaging of speech with field inhomogeneity correlated spiral fast low angle shot (FLASH) at 3T. J Magn Reson Imaging 2010; 32(5): 1228-1237. http://dx.doi.org/10.1002/jmri.22369
Fu M, Bo Z, Shosted RK, et al. High-resolution dynamic speech imaging with joint low-rank and sparsity constraints. Magn Reson Med 2015; 73(5): 1820-1832. http://dx.doi.org/10.1002/mrm.25302
Perry JL, Kuehn DP, Sutton BP, Fang X. Analyses of velopharyngeal function in children using real-time dynamic MRI. Cleft Palate Craniofac J; in press
Bae Y, Kuehn DP, Conway CA, Sutton BP. Real-time magnetic resonance imaging of velopharyngeal activities with simultaneous speech recordings. Cleft Palate Craniofac J 2011; 48(6): 695-707. http://dx.doi.org/10.1597/09-158
Jelinek F. Statistical methods for speech recognition. Massachusetts: MIT press 1998.
Li Q, Soong FK, Siohan O. A high-performance auditory feature for robust speech recognition. In Proc. of the 6th International Conference on Spoken Language Processing (ICSLP) 2000; pp. 51-54.
Han W, Chan CF, Choy CS, Pun KP. An efficient MFCC extraction method in speech recognition. In Proc. of the IEEE International Symposium on Circuits and Systems (ISCAS) 2006; pp. 145-148.
Ghitza O. Auditory models and human performance in tasks related to speech coding and speech recognition. Speech and Audio Processing IEEE Proceedings 1994; 2(1): 115-132. http://dx.doi.org/10.1109/89.260357
Sutton BP, Conway C, Bae Y, Brinegar C, Liang ZP, Kuehn DP. Dynamic imaging of speech and swallowing with MRI. In Proc. of IEEE Eng Med Biol Soc 2009; pp. 6651-6654. http://dx.doi.org/10.1109/iembs.2009.5332869
Hodgson J. Understanding Records. 1st ed. London: Bloomsbury Academic 2010.
Slaney M. Auditory toolbox [Internet]. Interval Research Corporation [cited 2015 November 8]. Available from: https://engineering.purdue.edu/~malcolm/interval/1998-010/
Rabiner L. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 1989; 77(2): 257-286. http://dx.doi.org/10.1109/5.18626
Forney GD. The Viterbi algorithm. Proceedings of the IEEE 1973; 61(3): 268-278. http://dx.doi.org/10.1109/PROC.1973.9030
Welch G, Bishop G. An Introduction to the Kalman filter [Internet]. Department of Computer Science, University of North Carolina at Chapel Hill [cited 2015 November 20]. Available from: http://www.cs.unc.edu/~tracker/media/ pdf/SIGGRAPH2001_CoursePack_08.pdf