报告题目：Selected Topics in Multimedia Analytics (多媒体内容分析的热点问题)
报 告 人：Xavier Anguera 博士(Research Scientist, Multimedia Research Group, Telefonica Research, Barcelona, Spain)
主 持 人：谢磊 教授
Xavier Anguera: Ing. [MS] 2001 UPC University (Barcelona, Spain), [MS] 2001 European Masters in Language and Speech, Ph.D. 2006 UPC University, with a thesis on speaker diarization for multi-microphone meeting recordings. From 2001 to 2003 he worked for Panasonic Speech Technology Lab in Santa Barbara, CA on text-to-speech for several languages. From 2004 to 2006 he was a visiting researcher at the International Computer Science Institute (ICSI) in Berkeley, CA. Since 2007 he is a research scientist at Telefonica Research in Barcelona. His research interests cover speech processing (both speaker and content-based) and multimodal multimedia processing. He has published over 60 peer reviewed papers and has several accepted or pending patents. He is an active member of IEEE and ACM associations, for which he has served in the organization and in the PC of several multimedia and speech conferences.
this talk I will cover three topics that I have been working on during the last 3 years. First, I will talk about multimodal video-copy detection, which focuses on finding whether a given video contains any modified video excerpts obtained from an original video. For this we implemented a novel binary audio fingerprint that we call MASK, which I will describe. Next, I will talk about speaker recognition, in which we want to find whether an audio recording of a speaker belongs or not to its claimed identity. For this task we have developed a novel binary speaker representation and modeling technique. Last, I will speak about spoken web search, in which a given audio query is searched for inside a big audio database. For this we have proposed a novel algorithm based on Dynamic Time Warping (DTW) that allows to pre-index the audio database for faster retrieval of matches, and uses very little memory in comparison to standard DTW techniques.