AI Sound Separation – analyze audio sources to extract component tracks
Sony's AI Sound Separation is used by audio creatives and engineers to separate mixed audio sources into their component tracks using state-of-the-art algorithms based on deep neural networks. It has already been successfully tested with leading players in the entertainment industry, such as movie and music production companies.
Try the demo now and contact us for potential collaboration.
The AI Sound Separation concept
The diagram shows, at a concept level, AI Sound Separation taking a mixed audio source and generating separate tracks for four instruments. This concept can be extended however you like. In a music file, maybe you want different or more instruments. Maybe your audio source is a movie soundtrack, or a video conference.
It's about training your model to do what you want...
In 2021, Sony R&D Centers in Japan and Germany organized a competition – The Music Demixing (MDX) Challenge, in which research teams and machine learning enthusiasts were invited to create systems that use AI to perform audio separation on a specially-prepared, hidden dataset of songs. It attracted entries from over 400 participants and received more than 1500 submissions. You can read about the challenge and learn about the results of the competition here.
In 2023, a new edition of the challenge was organized, called The Sound Demixing (SDX) Challenge 2023, where participants were invited to complete the task of separating movie audio into dialogue (DX), music (MX), and sound effects (SFX), a process referred to as Cinematic Source Separation. Read more about the challenge.
Why use Sony's AI Sound Separation
AI Sound Separation algorithms developed by Sony.
Separate mixed audio sources into component tracks, such as: vocals, drums, bass and other instruments.
Integrate AI Sound Separation easily into your application using our web API.
The AI Sound Separation API
Access the AI Sound Separation API from the TechHub API Library.
Potential application areas
- Hearing aids with voice isolation
- Technology for those who are hard of hearing
- Denoising video conference audio
- Removing background sounds
- Remixing songs
- Upmixing audio tracks into surround sound
- Reviving mono audio tracks
- Karaoke content generation
- Automatic lyric transcription
Traveling back 60 years with AI Sound Separation
Watch a demo of the result of AI Sound Separation on "Lawrence of Arabia" (1962)
AI Sound Separation has been applied to the audio of "Lawrence of Arabia" (1962). Its mono/stereo source soundtrack has been separated into different tracks, such as: dialogue, foley (sound effects), crowd noise, and horses galloping. The resulting tracks were then mixed into Dolby Atmos surround to produce a complete 4K Ultra HD Immersive experience.
Timeless collaboration between Glenn Gould and Kanji Ishimaru (1961)
AI Sound Separation was used to create a unique remix of the classical recording of Enoch Arden (1961) by Richard Strauss, with Glenn Gould on piano and Claude Rains narrating.
In 2020, the original master tape from 1961 was incorporated with the voice of Kanji Ishimaru, a famous Japanese musical actor and singer, replacing the original narrator. This has produced a high-quality Japanese version of the original masterpiece recording.