威尼斯人官网 威尼斯人官网 English

学术报告

2013年9月12日上午学术报告通知

2013年09月06日  

 

报告题目:Speaker recognition by combining MFCC and phase information in noisy conditions

        (结合MFCC特征和相位信息的噪声条件下说话人识别)

报 告 人:王龙彪(Longbiao Wang)

          Associate Professor, Nagaoka University of Technology, Japan

主 持 人:谢磊 教授

地    点:澳门威尼斯人注册楼105学术报告厅

时    间:2013年9月12日(周四)上午10:20-:11:20

报告人概况:

Longbiao Wang received his Dr. Eng. degree from Toyohashi University of Technology, Japan, in 2008. He was an assistant professor in the faculty of Engineering at Shizuoka University, Japan from April 2008 to September 2012. Since October 2013 he has been an associate professor at Nagaoka University of Technology, Japan. His research interests include robust speech recognition and speaker recognition. He received the “Chinese Government Award for Outstanding Self-financed Students Abroad” in 2008. He is a member of IEEE, the Institute of Electronics, ISCA, APSIPA, Information and Communication Engineers (IEICE) and the Acoustical Society of Japan (ASJ).

报告摘要:

In this talk, we investigate the effectiveness of phase for speaker recognition in noisy conditions and combine the phase information with mel-frequency cepstral coefficients (MFCCs). To date, almost speaker recognition methods are based on MFCCs even in noisy conditions. For MFCCs which dominantly capture vocal tract information, only the magnitude of the Fourier Transform of time-domain speech frames is used and phase information has been ignored. High complement of the phase information and MFCCs is expected because the phase information includes rich voice source information. Furthermore, some researches have reported that phase based feature was robust to noise. We propose a phase information extraction method that normalizes the change variation in the phase depending on the clipping position of the input speech, and evaluate the robustness of the proposed phase information for speaker identification in noisy conditions. MFCCs outperformed the phase information for clean speech. On the other hand, the degradation of the phase information was significantly smaller than that of MFCCs for noisy speech. The individual result of the phase information was even better than that of MFCCs in many cases by clean speech training models. By integrating the phase information with MFCCs, the speaker identification error reduction rate was about 30%-60% compared with the standard MFCC-based method.

上一条:2013年9月13日学术报告通知 下一条:2013年9月9日学术报告通知

关闭

威尼斯人官网
XML 地图 | Sitemap 地图