Some observations on computer lip-reading: moving from the dream to the reality

Article


Bear, Y., Owen, Gari, Harvey, Richard and Theobald, Barry-John 2014. Some observations on computer lip-reading: moving from the dream to the reality. Proceedings of SPIE. 9253. https://doi.org/10.1117/12.2067464
AuthorsBear, Y., Owen, Gari, Harvey, Richard and Theobald, Barry-John
Abstract

In the quest for greater computer lip-reading performance there are a number of tacit assumptions which are either present in the datasets (high resolution for example) or in the methods (recognition of spoken visual units called "visemes" for example). Here we review these and other assumptions and show the surprising result that computer lip-reading is not heavily constrained by video resolution, pose, lighting and other practical factors. However, the working assumption that visemes, which are the visual equivalent of phonemes, are the best unit for recognition does need further examination. We conclude that visemes, which were defined over a century ago, are unlikely to be optimal for a modern computer lip-reading system. © (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.

KeywordsLip-reading; speech recognition; pattern recognition
JournalProceedings of SPIE
Journal citation9253
ISSN0277-786X
1996-756X
Year2014
PublisherSPIE (Society of Photo-optical Instrumentation Engineers)
Accepted author manuscript
License
CC BY-NC-ND
Digital Object Identifier (DOI)https://doi.org/10.1117/12.2067464
Web address (URL)http://proceedings.spiedigitallibrary.org/volume.aspx?conferenceid=3393&volumeid=16739
Publication dates
Print07 Oct 2014
Publication process dates
Deposited28 Feb 2017
Copyright information© SPIE. Proc. SPIE 9253, Optics and Photonics for Counterterrorism, Crime Fighting, and Defence X; and Optical Materials and Biomaterials in Security and Defence Systems Technology XI, 92530G (October 7, 2014)
Place of publicationUnited States of America
ISBN9781628413168
Book titleVolume 9253 Optics and Photonics for Counterterrorism, Crime Fighting, and Defence X; and Optical Materials and Biomaterials in Security and Defence Systems Technology XI
EditorsBurges, Douglas, Owen, Gari, Rana, Harbinder, Zamboni, Roberto, Kajzar, François and Szep, Attila A.
Permalink -

https://repository.uel.ac.uk/item/858q8

Download files


Accepted author manuscript
  • 72
    total views
  • 245
    total downloads
  • 0
    views this month
  • 2
    downloads this month

Export as

Related outputs

Resolution limits on visual speech recognition
Bear, Y., Harvey, Richard, Theobald, Barry-John and Lan, Yuxuan 2014. Resolution limits on visual speech recognition. in: IEEE International Conference on Image Processing (ICIP) IEEE.
Which phoneme-to-viseme maps best improve visual-only computer lip-reading?
Bear, Y., Harvey, Richard W., Theobald, Barry-John and Lan, Yuxuan 2014. Which phoneme-to-viseme maps best improve visual-only computer lip-reading? in: Bebis, George, Boyle, Richard, Parvin, Bahram, Koracin, Darko, McMahan, Ryan, Jerald, Jason, Zhang, Hui, Drucker, Steven M., Kambhamettu, Chandra, Choubassi, Maha El, Deng, Zhigang and Carlson, Mark (ed.) Advances in Visual Computing: 10th International Symposium, ISVC 2014, Las Vegas, NV, USA, December 8-10, 2014, Proceedings, Part II Springer International Publishing.
Decoding visemes: Improving machine lip-reading
Bear, Y. and Harvey, Richard 2016. Decoding visemes: Improving machine lip-reading. in: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE.
Finding phonemes: improving machine lip-reading
Bear, Y., Harvey, Richard W. and Lan, Yuxuan 2015. Finding phonemes: improving machine lip-reading. FAAVSP - The 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing. Education Centre of the Jesuits, Vienna, Austria 11 - 13 Sep 2015 International Speech Communication Association. pp. 115-120
Speaker-independent machine lip-reading with speaker-dependent viseme classifiers
Bear, Y., Cox, Stephen J. and Harvey, Richard W. 2015. Speaker-independent machine lip-reading with speaker-dependent viseme classifiers. FAAVSP - The 1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing. Education Centre of the Jesuits, Vienna, Austria 11 - 13 Sep 2015 International Speech Communication Association. pp. 190-195
Phoneme-to-viseme mappings: the good, the bad, and the ugly
Bear, Y. and Harvey, Richard 2017. Phoneme-to-viseme mappings: the good, the bad, and the ugly. Speech Communication. 95, pp. 40-67. https://doi.org/10.1016/j.specom.2017.07.001
Comparing phonemes and visemes with DNN-based lipreading
Thangthai, Kwanchiva, Bear, Y. and Harvey, Richard 2017. Comparing phonemes and visemes with DNN-based lipreading. 28th British Machine Vision Conference. London, UK 04 - 07 Sep 2017 BMVA Press.
Visual speech recognition: aligning terminologies for better understanding
Bear, Y. and Taylor, Sarah L. 2017. Visual speech recognition: aligning terminologies for better understanding. 28th British Machine Vision Conference. London, UK 04 - 07 Sep 2017 BMVA Press.
Visual gesture variability between talkers in continuous speech
Bear, Y. 2017. Visual gesture variability between talkers in continuous speech. 28th British Machine Vision Conference. London, UK 04 - 07 Sep 2017 BMVA Press.