TRAX Pro Tutorial – Vocal Extraction Walkthrough

May 19, 2015ADX Technology, Blog

In this tutorial, we’ll be walking through a project in TRAX Pro – from loading in the file all the way through the final cleanup on the Spectral editing screen.

We’re using an excerpt from Sam Smith’s “I’m Not The Only One” (the second verse and first chorus to be exact).

This section features a vocal, guitar, piano, bass and drums. The chorus is noticeably more intense and loud and adds a handclap effect to the percussion. Since drum hits are often extracted along with the vocal, we’ll work to minimize their intrusion by using the different processing options and creating a composite track out of the results.

Step One: Guiding on the Separate Screen

To begin, we’ll start a new project in TRAX and load in the audio file. TRAX will automatically generate a pitch guide and an initial separation; a music and a vocal file.001-automaticextraction

Listen to the results to identify areas in which you might need to edit the pitch guide to get a better result. The first thing you’ll want to do is erase the pitch guide where there is no vocal present. We don’t want to make extra work for ourselves later by unnecessarily leaving in pieces of music.


My typical workflow to get an accurate pitch guide on the Separate screen is to switch between listening to the vocal and music tracks, making adjustments to the pitch guide as I go. For example, when Sam sings “I wish”, TRAX took the pitch guide down too low, so the start of “wish” gets cut off. Using the pencil tool, we can easily correct this.


Remember to press the Separate button to hear your changes reflected on the music and vocal tracks. To get an approximation of the pitch guide without hitting Separate you can solo the Guide Tone. Now when you hit play, you’ll hear a filtered approximation of the separation that your pitch guide will create.


Sometimes a separation will not come out the way you think it will, or you’ll have a hard time making the pitch guide match the melody. For these difficult areas, try using the marquee selector tool to draw a rectangle around the approximate location of the vocal. This also works well to grab the noisy consonant sounds of S’s or F’s at the beginning or ends or words. I used it for the “F” in “For months on end” since it didn’t come out with the rest of the voice there. These sounds are higher in the frequency range so draw the rectangle higher up than the rest of the pitch guide.



When your pitch guide accurately traces the vocal melody, press Separate again to process your changes. Unless you’re very lucky, you’ll still hear bits of vocal in the music track, and instruments still playing in the vocal track. This is totally normal and we’ll address these issues with our work on the Process and Spectral tabs. In this example, we’ve lost a few bits of the voice to the music file. The big challenge will be the pesky snare and hi-hat hits that have been separated along with the vocal.

Step 2: Advancing on the Process Screen

On the Process screen you’ll see the Automatic and Refined separations from the Separate screen. You switch between the vocal and music by using the tabs up at the top (or ⌘- keys).


The process screen offers access to advanced processing algorithms via the bottom toolbar. This song has a lot of reverb on the vocal, so I decided to use the “Long Reverb” function. The reverb option also tends to reduce the breathy elements and sibilance of the vocal, so I combined Long Verb with Consonants boost to try to retain these elements.


It’s always worth trying different combinations of processing options. For this example I tried:

Long Verb with Consonants boost

High Quality (HQ)

HQ with Long Verb

I found that the HQ with Long Verb setting gave me the best results: a pretty clear extraction of the vocal with the reverb, but greatly reduced drum hits. However, it wasn’t perfect, and there were certain words and portions of words that came out better in other processes. For example, the HQ Long Verb track lost the “S” in “You say I’m crazy”, but it was still present in the HQ track.

This is where the comping feature of TRAX comes in. Instead of being stuck using one track or the other, we can select the best parts of each. In this case, I knew I wanted to use mostly HQ Long Verb, so I double-clicked this track, sending it in full down to the comp track. To select parts of other processed tracks, hold Shift as you make selections.

3) Perfecting Your Result on the Spectral Screen

The next step to create our vocal isolation is to tackle the remaining drum hits on our vocal track. To isolate drum hits in the vocal, try selecting the duration of the hit with the time selector on the music tab. Move to the vocal tab, which will keep the selection. Then use the Tonal/Noise process with Tonal at 0, Noise at -99, and the balance slider between -20 and -50. I find it works best to repeat the process twice. If the hit occurs in a long sustained vocal note, you can also try the Smart Attenuate tool.


Don’t be afraid to change your spectral settings, especially when zoomed in. This may bring elements into focus that weren’t previously visible. Try a larger window size, and then turning on adaptive representation.


Look under unvoiced consonants to remove tonal elements. An unvoiced consonant is a sound we make without vibrating our vocal cords. For example, the ‘f’ in ‘frequency’ is an unvoiced consonant because all we do is make an airy ‘fffff’ sound. The ‘v’ in ‘volume’ has the same mouth shape as the ‘f’ but we vibrate our vocal cords when we pronounce it, so ‘v’ is called a voiced consonant. Because the unvoiced consonants (which also include s-, sh-, ch-, p-, th-, t-, and k-) do not produce tonal sounds, we can look under them in the spectrogram to clean unwanted musical elements. For example, here is the ‘s’ in “still”, before and after cleaning.


Turning our attention to the music tab, we want to restore any pieces of the vocal that have been left in the music track. These are usually easy to identify as wavy bits against the usually straight background of the instruments.

circled vocals

You can remove these pieces of vocal quickly using the Smart Attenuate tool. Select the area around the vocal and adjust the learning size and centering to tell the computer what should NOT be removed.



You can also remove these vocal bits using the harmonic selection tool or the lasso tool.


There are some pieces of consonants also remaining in the music track. For example, around 37 seconds when Sam sings “call me baby”, the part of the “C” was left behind. You’ll see this with “S” sounds a lot as well.


If you haven’t worked with a spectral editor before, this may seem complicated at first. But soon you’ll be able to quickly identify sounds based on how they look in the spectrogram, and you’ll have better results in no time!

Here is the result after completing the spectral editing process.

The techniques used for this example won’t necessarily work for every song, so experiment and let us know what works for you! As always if you have questions or comments, we’d love to hear from you at