The microphone(s) pick up the voice of the near end, and the speaker amplifies the far end. The trouble starts when the voice of the far end, played back through the speaker in the room, is captured by the microphone and transmitted back to the far end. At that moment, the far end hears their own voice as echo.
If you have ever experienced that, you will know that it is extremely difficult to maintain coherence of thought. Our brains are unable to function correctly when hearing our own voice with delay – and the more delay between the near end and far end (often called “round trip delay”), the harder it becomes to have productive conversations.
Half Duplex communication was one of the first solutions to the problem, and indeed it is still used in some scenarios. This simple technique effectively mutes the near end microphone when the far end speaks, thus eliminating a path for echo to occur. But implementing a half-duplex solution means that the conference is no longer a conversation, rather a series of monologues, as the far end won’t hear the near end until they finished talking. Interjections are not heard, and thus this isn’t a true conversation.
Acoustic echo cancellation (AEC) offers a solution to this, so let’s take a peek under the hood at the anatomy of AEC. Designing high-performance AEC processing requires a
huge invest in engineering and in many cases details of the technology are proprietary to the company developing it. But it is generally agreed that most designs on the market rely on the following components:
The AEC processor needs to know what audio should be removed from the microphone. That’s accomplished by providing a “reference” signal; most often, the far-end voice. That way, the AEC circuit knows which signal is unwanted and, if the process works correctly, the far end will only hear the near-end conversation, and not their own echo.
The adaptive filter is where most of the AEC complexity lies. In simple terms, the AEC models an out-of-polarity signal intended to cancel out the far-end signal picked up by the microphone. Adaptive filters can be implemented in several ways, including time and/or frequency domain adaptation, different sets of calculation forms, and the choice of filter type being applied.
Non Linear Processing (NLP)
NLP is roughly equivalent to a sophisticated ducking technology used to suppress any residual echo that the adaptive filter did not cancel. While often required, it is worth mentioning that some AEC technologies can sometimes rely too much on the NLP, which results in over-suppression and/or distortion of the near-end speech for both primary (close to mic) and secondary (far from mic) talkers. Excessive use of NLP can destroy the “ambience” of the near end transmitted to the far end.
Some AEC circuits may include additional processing, such as Filters (typically a High-Pass), Noise Reduction or even Automatic Gain Control (AGC), all designed to improve audio quality. As you choose your solution, it is worth comparing what is offered from each solution.
There are common terms used when referring to AEC. The following represent a core group of terms that will ensure a level playing field when comparing solutions:
Usually referred to as Convergence rate or Loss of convergence, it refers to how quickly the AEC processor models the room and adapts to echo path changes. AEC ‘converges’ when it efficiently adapts filter parameters and echo is removed. Loss of convergence occurs when the AEC circuit cannot track acoustic echo path changes resulting in echo being heard by the far end.
Residual echo are acoustic echo signal(s) that were not removed by the AEC technology.
This occurs when speech is generated on both ends of the line. It is the most demanding state for the AEC circuit correct and adapt for.
Tail length refers to the maximum echo time delay that will be removed by the AEC.
In order to consistently gage and judge the performance of an AEC solution, the following metrics (usually measured in dB) are generally used:
• Echo Return Loss (ERL): The ratio between the far-end audio arriving at the AEC reference input and the signal picked up by the microphone.
• Echo Return Loss Enhancement (ERLE): The amount of echo attenuation or reduction introduced by the AEC process.
• Combined Loss (ACOM)/ Total Echo Reduction (TER): ACOM represents the sum of the ERL and ERLE, and indicates the total echo reduction that was introduced by the room acoustics (ERL), the AEC, and Non Linear Processing (ERLE).
Difficulties of Acoustic Echo Cancellation
While acoustic echo cancellation brings amazing clarity to conversations and can reduce the stress experienced when using sub-par conferencing technology, cancelling one voice while keeping the other voices intact is extremely difficult.
Consider that in most cases, the acoustic echo signal picked up by the microphone is different from the AEC reference signal. It has been amplified and played back in an unknown acoustic environment where acoustic reflections have further modified the signal, thus creating an effective filter is a significant challenge.
Additionally, the acoustics may be dynamic. Changing constantly through the use of a wireless microphone as the talker is moving around the room, or is there are people entering and exiting the room. This requires the AEC to automatically adapt to dynamic changes by updating filter parameters accordingly.
Because AEC is such a complex process, it can take processing time as adjustments are made throughout a conference call. This latency is unavoidable, but different AEC circuits vary on how much latency can be added, from less than 20ms to well over 40ms in some solutions. This is important to keep in mind, especially when designing rooms that require local sound reinforcement.
There is a lot that goes into engineering a robust AEC solution. And while the above provides an initial overview on AEC, it is reason enough to continue this TechTalk to the next issue where I’ll discuss an equally important topic on how to successfully implement a distance conferencing solution with AEC.
Until then, keep in mind that even the most sophisticated technology can be rendered useless without the proper implementation. Our ears are amazing tools and are very hard to trick. If challenging room acoustics or even a poor setup won’t allow AEC to work properly, the far end will notice.
*Editor’s Note: This article first appeared in Systems Integration Asia Jun-Jul 2016 issue.