AI Sound Separation | Sony's TechHub

AI Sound Separation – analyze audio sources to extract component tracks

Sony's AI Sound Separation is used by audio creatives and engineers to separate mixed audio sources into their component tracks using state-of-the-art algorithms based on deep neural networks. It has already been successfully tested with leading players in the entertainment industry, such as movie and music production companies.

Try the demo now and contact us for potential collaboration.

The AI Sound Separation concept

The diagram shows, at a concept level, AI Sound Separation taking a mixed audio source and generating separate tracks for four instruments. This concept can be extended however you like. In a music file, maybe you want different or more instruments. Maybe your audio source is a movie soundtrack, or a video conference.

It's about training your model to do what you want...

Diagram showing a soundwave representing a mixed track of different instruments on the left. On the right the soundwave is shown in its component parts for guitar, trumpet, piano, and voice.

In 2021, Sony R&D Centers in Japan and Germany organized a competition – The Music Demixing (MDX) Challenge, in which research teams and machine learning enthusiasts were invited to create systems that use AI to perform audio separation on a specially-prepared, hidden dataset of songs. It attracted entries from over 400 participants and received more than 1500 submissions. You can read about the challenge and learn about the results of the competition here.

In 2023, a new edition of the challenge was organized, called The Sound Demixing (SDX) Challenge 2023, where participants were invited to complete the task of separating movie audio into dialogue (DX), music (MX), and sound effects (SFX), a process referred to as Cinematic Source Separation. Read more about the challenge.

Why use Sony's AI Sound Separation

State-of-the-art algorithms

AI Sound Separation algorithms developed by Sony.

Unmix audio into separate tracks

Separate mixed audio sources into component tracks, such as: vocals, drums, bass and other instruments.

Demo and web APIs

Integrate AI Sound Separation easily into your application using our web API.

The AI Sound Separation API

Access the AI Sound Separation API from the TechHub API Library.

Access API

Potential application areas

Healthcare

Hearing aids with voice isolation
Technology for those who are hard of hearing

Person wearing noise-cancelling headphones working at a laptop on a table shared with colleagues talking on the same table.

Noise surpression

Denoising video conference audio
Removing background sounds

Closeup of a short length of photographic film held up to the light for inspection

Music and video

Remixing songs
Upmixing audio tracks into surround sound
Reviving mono audio tracks

Young woman sitting cross-legged on a couch as she enjoys content from her cell played through earbuds

Entertainment

Karaoke content generation
Automatic lyric transcription

Traveling back 60 years with AI Sound Separation

Watch a demo of the result of AI Sound Separation on "Lawrence of Arabia" (1962)

AI Sound Separation has been applied to the audio of "Lawrence of Arabia" (1962). Its mono/stereo source soundtrack has been separated into different tracks, such as: dialogue, foley (sound effects), crowd noise, and horses galloping. The resulting tracks were then mixed into Dolby Atmos surround to produce a complete 4K Ultra HD Immersive experience.

Timeless collaboration between Glenn Gould and Kanji Ishimaru (1961)

AI Sound Separation was used to create a unique remix of the classical recording of Enoch Arden (1961) by Richard Strauss, with Glenn Gould on piano and Claude Rains narrating.

In 2020, the original master tape from 1961 was incorporated with the voice of Kanji Ishimaru, a famous Japanese musical actor and singer, replacing the original narrator. This has produced a high-quality Japanese version of the original masterpiece recording.

Separate and enhance mixed sound sources using AI