Prof. Anja Leue from the psychology department of our university organized a lecture series on the topic "language and society". Also the DSS group participated in that event and we presented some of our results on speech in disturbed environments. The lecture took place on Thursday, 5th of May, in one of the lecture rooms in the Audimax building. After the talk, a nice, interesting (and for our field) long discussion took place. Here is the contents of the DSS talk:
Nowadays technical systems allow for voice communication even in very disturbed environments. Examples are communication masks for firefighters, swim googles for under water speech communication or speech communication within cars. In the latter example, the speech of dialog partners is impaired by several factors. Depending on the driving speed, a moderate or even high level of background noise superposes to the speech signals generated by the passengers or by loudspeakers that emit the signals from communication partners connected via mobile phones. Due to the seat adjustment (position and orientation) the front passengers do not speak into the direction of the rear passengers and face-to-face communication among the passengers is not as easy as in a “normal” communication.
If so-called ICC systems (ICC abbreviates in-car communication) are used, the passengers are recorded using microphones. After appropriate signal processing (mainly noise, echo, and feedback reduction) the enhanced signals of the talkers are played back via loudspeakers close to the ears of the listening passengers. At first glance such systems face the same problems as hands-free or speech dialog systems but due to the closed electro-acoustic loop that they have to operate in special problems arise, e.g. correlation of the local signals with the loudspeaker signals that lead to problems when performing system identification with adaptive filters. Furthermore, the enhancement usually leads to a better signal-to-noise ratio at the ears of the listeners. However, the more the signal-to-noise ratio is improved for the listening passengers the more the speaking passengers are aware of or even disturbed by their own voices due to echo perception.
In this talk, I will try to mention most of the challenges that one faces when building enhancement systems for speech in disturbed environments. The solution to these challenges is usually a “cocktail” of individual processing units where the ingredients are low-delay filterbanks, adaptive structures for system identification, spectral suppression rules, decorrelation schemes, and adaptive mixing approaches. In most cases a compromise between the needs of the talking and the listening passengers has to be found which makes this application a very interesting challenge.
If one combines pure ICC systems with other speech and audio systems in a car such as hand-free, anti-noise, or music playback systems the complexity of the resulting system increases. However, the system components mentioned before can be combined such that they can overcome some of the problems, which is again an interesting challenge.
Let me finally mention that even after decades of great and continuous improvement in speech and audio signal processing the communication of people in highly disturbed environments could still be improved. Thus, speech signal enhancement remains “a rocky road” – to say it with the words of one of the early German speech processing researches.