当前位置 :

新闻中心

关于Long Qin (M*Modal research scientist)学术报告的通知

  讲座题目: Learning Out-of-Vocabulary Words in Automatic Speech Recognition

  主讲嘉宾:Long Qin (M*Modal research scientist)

  时 间:2014年3月28日10:00-11:30

  地 点:科大西区科技实验西楼一楼117会议室

  主办单位:语音及语言信息处理国家工程实验室

  报告摘要:

  Out-of-vocabulary (OOV) words are unknown words that appear in the testing speech but not in the recognition vocabulary. They are usually important content words such as names and locations which contain information crucial to the success of many speech recognition tasks. However, most speech recognition systems are closed-vocabulary recognizers that only recognize words in a fixed finite vocabulary. When there are OOV words in the testing speech, such systems cannot identify OOV words, but misrecognize them as in-vocabulary (IV) words. Furthermore, the errors made on OOV words also affect the recognition accuracy of their surrounding IV words. Therefore, speech recognition systems in which OOV words can be detected and recovered are of great interest.

  Current OOV research focuses on detecting the presence of OOV words in the testing speech. There is only limited work about how to convert OOV words into IV words of a recognizer. In this talk, I will present our work on learning OOV words in speech recognition. We will show that it is feasible for a recognizer to automatically learn new words and operate on an open vocabulary. Specifically, we built an OOV word learning framework which consists of three major components. The first component is OOV word detection, where we built hybrid systems using different sub-lexical units to detect OOV words during decoding. We also studied to improve the hybrid system performance using system combination and OOV word classification techniques. Since OOV words can appear more than once in a conversation or over a period of time, in the OOV word clustering component, we worked on finding multiple instances of the same OOV word. At last, in OOV word recovery, we explored how to integrate identified OOV words into the recognizer’s lexicon and language model. Our work was tested on tasks with different speaking styles and recording conditions including the Wall Street Journal (WSJ), Broadcast News (BN), and Switchboard (SWB) datasets. Our experimental results show that we are able to detect and recover up to 40% OOV words using the proposed OOV word learning framework.

  嘉宾简介:

  Long Qin is a research scientist at M*Modal. His research interests include out-of-vocabulary words learning, acoustic modeling and many other topics in speech recognition. He also had extensive experiences in voice conversion and HMM-based speech synthesis. He received his Ph.D. in 2013 from Carnegie Mellon University, and his M.Sc. in 2007 and B.Sc in 2004 from University of Science and Technology of China. Dr. Qin is a member of IEEE and ISCA. He has also served as reviewer for IEEE and ISCA conferences and magazines.