Sidebar Menu

ITG Conference on Speech Communication | 29.09.2021 - 01.10.2021 | Kiel





Statistical Signal Processing and Machine Learning for Speech Enhancement

Prof. Dr. Timo Gerkmann
Universität Hamburg
Department of Informatics
Signal Processing Research Group


Contents of the Keynote:

Speech Signal Processing is an exciting research field with many applications such as Hearing Devices, Telephony and Smart Speakers. While in noisy environments the performance of these devices may be limited, leveraging modern Machine Learning techniques has recently shown impressive improvements in performance for the estimation of clean speech signals from noisy microphone signals. Yet, in order to to build real-time, robust and interpretable algorithms, those machine learning techniques need to be combined with domain knowledge in signal processing, statistics and acoustics. In this talk, we will present recent research results from our group that follow this perspective by exploiting end-to-end learning, multichannel configuration and deep generative models.


Short CV of Timo Gerkmann:

Timo Gerkmann studied Electrical Engineering and Information Sciences at the universities of Bremen and Bochum, Germany. He received his Dipl.-Ing. degree in 2004 and his Dr.-Ing. degree in 2010 both in Electrical Engineering and Information Sciences from the Ruhr-Universität Bochum, Bochum, Germany. In 2005, he spent six months with Siemens Corporate Research in Princeton, NJ, USA. During 2010 to 2011 Dr. Gerkmann was a postdoctoral researcher at the Sound and Image Processing Lab at the Royal Institute of Technology (KTH), Stockholm, Sweden. From 2011 to 2015 he was a professor for Speech Signal Processing at the Universität Oldenburg, Oldenburg, Germany. During 2015 to 2016 he was a Principal Scientist for Audio & Acoustics at Technicolor Research & Innovation in Hanover, Germany. Since 2016 he is a professor for Signal Processing at the University of Hamburg, Germany. His research interests are on statistical signal processing and machine learning algorithms for speech and audio applied to communication devices, hearing instruments, audio-visual media, and human-machine interfaces. Timo Gerkmann serves as an elected member of the IEEE Signal Processing Society Technical Committee on Audio and Acoustic Signal Processing and as an Associate Editor of the IEEE/ACM Transactions on Audio, Speech and Language Processing.


Pathological Speech Analyses: From Classical Machine Learning to Deep Learning

Prof. Dr. Juan Rafael Orozco-Arroyave
University of Antioquia


Contents of the Keynote:

There are many different diseases that affect different aspects or dimensions of speech. Automatic evaluation of speech has evolved in the last decades to a point such that it could be considered suitable to support the diagnosis, and follow-up of patients (including their response to a given therapy). The evolution of this topic mainly started with classical signal processing and machine learning methods, and since a few years ago modern deep learning methods have been incorporated. However, there is still a lot to do in order to incorporate these models into the normal clinical practice. The most important challenges include the amount of available data to address specific topics and the interpretability of the resulting models. The aim of this talk is to show some of the applications of classical and modern methods to model different speech signals of patients with different speech disorders including hypokinetic dysarthria (that results from Parkinson’s disease and other neurological conditions), hoarseness (that results from laryngeal cancer) and hypernasality (that appear mainly in children with cleft lip and palate).


Short CV of Juan Rafael Orozco-Arroyave:

Juan Rafael Orozco-Arroyave was born in Medellín, Colombia in 1981. He is Electronics Engineer from the University of Antioquia (2004). From 2004 to 2009 he was working for a telco company in Medellín, Colombia. In 2011 he finished the MSc. degree in Telecommunications from the Unversidad de Antioquia. In 2015 he finished the PhD in Computer Science in a double degree program between the University of Erlangen (Germany) and the University of Antioquia (Colombia). Currently Juan Rafael Orozco-Arroyave is associate Professor at the University of Antioquia and adjunct researcher at the Pattern Recognition Lab at the University of Erlangen.


Conversational AI in Production Cars

Dr. Christophe Couvreur
Merelbeke, Belgium

Vincent Pollet
Merelbeke, Belgium


Contents of the Keynote:

Artificial Intelligence is everywhere. Carmakers rely on conversational AI to offer a personalized and innovative experience to their users during the customer journey. We discuss the market trends and the technical solutions from Cerence and other providers that make it easy for consumers to intuitively and safely interact with the vehicle. We review the state of the art for the various components of conversational AI systems in cars today (audio signal processing, speech recognition, natural language understanding, dialogue management, contextual AI and content integration, natural language generation, speech synthesis, and multimodal integration) and we further zoom in how recent advances in speech synthesis technology bring more natural and personalized voice assistants to the car.


Short CV of Christophe Couvreur:

Christophe Couvreur is Vice-President Product, Core BU at Cerence, Inc. Cerence, Inc spun-off as the Automotive division of Nuance Communications in 2019. Prior to the spin-off, Christophe was with Nuance Communications from 2001 where he held a variety of positions in research, engineering, project/program management and general management, all focused on bringing innovative speech and AI products to market in the Automotive, Mobile or Gaming area.

Prior to Nuance, Christophe worked as a researcher at Lernout & Hauspie Speech Products, the University of Illinois at Urbana-Champaign and the Belgian National Fund for Scientific Research. He has also served as a Lieutenant in the Belgian Air Force.

Christophe holds a PhD in Applied Science from Faculté Polytechnique Mons (Belgium), a MSc in Electrical Engineering from the University of Illinois at Urbana-Champaign, a Master degree in Mathematics from Université Catholique de Louvain (UCL), and an Engineering degree from Faculté Polytechnique de Mons, as well as a MBA from Vlerick Business School.


Short CV of Vincent Pollet:

Vincent Pollet is a renowned expert in the field of Machine Learning applied to speech processing and has been working for over 22 years at four different speech technology companies; Lernout and Hauspie Speech Products, Scansoft, Nuance, and Cerence, without ever having changed jobs. As a prolific researcher, he introduced new technologies and successfully transformed research into widely adopted speech products. Currently, as director of applied AI, he coaches a team of experts to innovate and advance speech technology to deliver products that improve the lives of millions.