报告题目：Speaker recognition by combining MFCC and phase information in noisy conditions
报 告 人：王龙彪（Longbiao Wang）
Associate Professor, Nagaoka University of Technology, Japan
主 持 人：谢磊 教授
Longbiao Wang received his Dr. Eng. degree from Toyohashi University of Technology, Japan, in 2008. He was an assistant professor in the faculty of Engineering at Shizuoka University, Japan from April 2008 to September 2012. Since October 2013 he has been an associate professor at Nagaoka University of Technology, Japan. His research interests include robust speech recognition and speaker recognition. He received the “Chinese Government Award for Outstanding Self-financed Students Abroad” in 2008. He is a member of IEEE, the Institute of Electronics, ISCA, APSIPA, Information and Communication Engineers (IEICE) and the Acoustical Society of Japan (ASJ).
In this talk, we investigate the effectiveness of phase for speaker recognition in noisy conditions and combine the phase information with mel-frequency cepstral coefficients (MFCCs). To date, almost speaker recognition methods are based on MFCCs even in noisy conditions. For MFCCs which dominantly capture vocal tract information, only the magnitude of the Fourier Transform of time-domain speech frames is used and phase information has been ignored. High complement of the phase information and MFCCs is expected because the phase information includes rich voice source information. Furthermore, some researches have reported that phase based feature was robust to noise. We propose a phase information extraction method that normalizes the change variation in the phase depending on the clipping position of the input speech, and evaluate the robustness of the proposed phase information for speaker identification in noisy conditions. MFCCs outperformed the phase information for clean speech. On the other hand, the degradation of the phase information was significantly smaller than that of MFCCs for noisy speech. The individual result of the phase information was even better than that of MFCCs in many cases by clean speech training models. By integrating the phase information with MFCCs, the speaker identification error reduction rate was about 30%-60% compared with the standard MFCC-based method.