We use cookies to help us to improve our site and enables us to deliver the best possible service and customer experience. By clicking accept or continuing to use this site you are agreeing to let us share your data with third parties in accordance with our privacy policy. Learn more

News

A new way to focus on individual videoconference speakers - even in the same room

Dec 16, 2019

Focus on individual speakers

In everyday life it's something we don't even think about. When someone is talking to us, we naturally turn to look at them. It focuses our attention on them and shows we're interested in what they have to say.

When we're not in the same room as the speaker, it can be more of a challenge. Videoconference systems often provide an effective substitute by highlighting the image of the participant who is talking at any given time. But that only works if the participants are all on different lines.

If you're talking over a videoconferencing link to a group of people round a meeting table, you just see the group as a whole – and it can be difficult to work out who is saying what. That's where our technology can come to the rescue.

Our data-driven blind audio signal separation technology is able to separate multiple voices into individual streams, even when people are sitting side by side. These individual streams are then used to drive the face-detection bounding-box technology present in every smartphone – to focus the video on the person actively speaking. For other implementations – such as videoconferencing systems and smart TVs – we can also provide the face-detection bounding-box technology.

The benefits are clear. In the business world, videoconferences feel more natural and you don't have to keep interrupting to ask who has made a particular point. At home, virtual family get-togethers with relatives on the other side of the world come alive as you can see who is talking, almost as if they were in the same room.

The secret of our success is that our technology is able to 'listen' to a real-world acoustic scene and identify the prominent sources of sound. The solution doesn't need calibrating or training. And the sophisticated algorithm involved has a latency of just five milliseconds – making it ideal for real-time applications such as videoconferencing.

If you're attending the CES tech show in Las Vegas next month, you can hear the technology in action. We'll be doing demonstrations throughout the event (7-10 January) on stand 51902/E at Sands Expo, Hall G.