I/O: PROCESS

SAMPLE RATE CONVERSION

Sample rate conversion (also known as resampling) is the process of changing the sample rate of digital audio to a new sample rate by calculating the values of the new samples directly from the current samples. Upsampling is a sample rate conversion to a higher sample rate, whereas downsampling is to a lower sample rate.

PCM SAMPLE RATES

The sample rate in PCM audio formats defines the number of samples carried per second (measured in Hz or kHz). The Nyquist-Shannon sampling theorem dictates that the bandwidth (or range of reproducible frequencies) is equal to one half of the sample rate. For example, a recording with a sample rate of 44.1 kHz has a theoretical upper frequency limit of 22.05 kHz.

Single sample rates include 44.1 and 48 kHz. Double sample rates include 88.2 and 96 kHz. Quad sample rates include 176.4 and 192 kHz. Double and quad sample rates are referred to as high sample rates.

Upsampling

While the merits of high sample rate delivery formats may remain a topic of debate, the benefits of performing some types of digital processing at high sample rates are well documented. Having said that, PCM mixes submitted at a sample rate of 44.1 or 48 kHz are upsampled to 96 kHz prior to editing or processing of the mix. The resampling extends the bandwidth to around 40 kHz, well beyond the 20 kHz upper frequency limit of human hearing, which allows the artifacts from non-linear processing (i.e., compression, limiting, etc.) to be pushed into the inaudible range.

As a result of the sample rate conversion process, the bit depth of 24-bit (or 16-bit) mixes is converted to 32-bit floating point. The audio will remain in the 32-bit (float) format until the creation of the 24-bit and 16-bit WAV delivery masters.

DSD SAMPLE RATES

The sample rate in DSD audio formats does not define the number of samples in the conventional PCM sense. With DSD, the analog signal is encoded as a sequence of 1-bit values using high frequency sigma-delta modulation (SDM). In delta modulation, the change in the signal, rather than the absolute value, is encoded. The result of this process is a bitstream, as opposed to a stream of coded samples as with PCM.

DSD64 utilizes a sample rate of 2.8224 MHz and DSD128 a sample rate of 5.6448 MHz (also known as Double-rate DSD). The ultra high sample rate does not directly determine the bandwidth, since heavy noise shaping is required to reduce the noise and distortion caused by quantizing the audio signal to a single bit. In effect, the bandwidth of DSD64 is comparable to that of PCM audio with a sample rate of 88.2 kHz and DSD128 to that of PCM audio with a sample rate of 176.4 kHz.

Downsampling

Since DSD audio formats cannot be mixed or edited directly (proper dithering is not possible in a 1-bit digital audio format), DSD mixes are downsampled in a two stage process to a high sample rate PCM format prior to editing or processing of the mix. Both DSD64 and DSD128 mixes are first downsampled to the DXD format (32-bit (float)/352.8 kHz PCM audio). This stage is referred to as decimation. DSD64 mixes are then downsampled to 96 kHz. DSD128 mixes are subsequently downsampled to 192 kHz.

The DSD64 noise spectrum, beginning its rise slightly above 20 kHz, rises rapidly at 23 kHz with the majority of noise concentrated between 30 - 40 kHz. To suppress the significant amount of quantization noise, a steep FIR low-pass filter at 30 kHz is implemented during the decimation stage. The DSD128 noise spectrum begins a more gentle rise just above 40 kHz, so a low-pass filter at 50 kHz is utilized.

REPAIR

With the utilization of advanced audio repair tools, some types of flaws (i.e., clicks, pops, noise, etc.) can be removed from the mix if necessary. Three of the most common processes are outlined below:

De-click removes individual clicks, pops, and other short impulse noises by analyzing amplitude irregularities and smoothing them out.

De-crackle removes multiple clicks which are close together and at low amplitude using a similar process as De-click.

De-noise removes stationary noise and broadband hiss by creating a noise fingerprint from the offending noise and subtracting it from the signal.

Sometimes flaws cannot be repaired due to negative artifacts generated in the repair process, so it is always best to address them during mixing if possible. Other times a flaw can only be partially attenuated and not completely removed.

START/END/FADES

The head and tail of the track are trimmed and any fade-in and/or fade-out is applied.

SWEETENING

The sweetening phase is arguably the most well known step in audio mastering. During sweetening, the sonic attributes of the mix may be enhanced through the use of audio processing. The difference in sound can be subtle or dramatic depending on the requirements of the mix. I commonly use two forms of processing during sweetening: equalization and dynamics processing.

EQUALIZATION

Equalization (EQ) allows for the adjustment of the frequency response of the mix. The goal is to achieve a proper tonal balance, one in which the master translates on the widest range of consumer playback systems. The relationship between low, mid, and high frequencies may be altered in pursuit of this goal, as well as for purely aesthetic reasons. I normally use two instances of equalization (though there are always exceptions) - one pre-compression and one post-compression. The purpose of the primary EQ is essentially tonal shaping, where I prefer to utilize broad boosts and shallower cuts to attain a balance of robust lows, detailed mids, and natural highs. On the second EQ, I typically use a Baxandall style high shelf filter to restore any "air" which may have been lost through the application of compression. High-pass filtering and other surgical tasks may require an additional EQ inserted before the primary instance.

DYNAMICS PROCESSING

Dynamics processing is used to increase or reduce the dynamic range of the mix. Downward compression (referred to here simply as compression) and limiting reduce dynamic range.

Compression

Compression in mastering is typically used to provide sonic "glue" to the mix. As peaks are reduced, the mix becomes more uniform and an increased sense of weight and impact can be achieved. In addition, depending on attack and release timings, the softening of transients associated with the compression process can produce a slightly "warmer" tone. I tend to compress mixes rather lightly through the use of low ratios (2:1 or lower) and higher thresholds. The resulting gain reduction is usually in the range of 1-3 dB or less. The overall loudness of the mix can be raised through the application of makeup gain, but level is not my primary concern at this stage. Occasionally, I will employ split band compression (also known as frequency dependent compression) to compress one frequency range of the mix, such as the low end. De-essing is a form of frequency dependent compression.

Limiting

The common perception is that limiting is used only to increase average loudness. This is certainly one function of limiting in mastering, and to that end I prefer to use limiting over compression for this purpose, since heavier compression can quickly rob the mix of life. In addition to increasing loudness, though, limiting may impart the mix with a more "finished" sound by adding smoothness to the high end and punch to the low end. In a sense, limiting contributes to a record sounding, well, like a record.

LOUDNESS

The race to be the loudest in audio reproduction did not begin with the advent of the digital brickwall limiter with lookahead capabilities in the early to mid-1990s. In fact, the concept of the loudness race first surfaced in the 1940s as record labels competed to have the loudest 7" singles in jukebox machines, which were often fixed to a predetermined level. This trend continued as record producers and labels demanded increasing loudness so that their singles would stand out to radio program directors who were responsible for creating their respective station's playlists. (Berry Gordy at Motown was infamous for continually pushing the loudness envelope.) However, in the case of vinyl, the physical format has its limits and at a certain point will become unplayable if the level is too high. This, of course, is not the case with digital audio where the brickwall limiter is capable of pushing the average loudness to extreme levels without exceeding 0 dBFS.

DEFINING LOUDNESS

It is difficult to quantify average loudness, especially since the frequency response of human hearing is not linear at all sound pressure levels. Nonetheless, various attempts have been made over the years to measure loudness. Borrowed from the broadcast industry, much of the pro audio world today has settled on the EBU R 128 standard to define average loudness. The two principal units of measurement under EBU R 128 are the LU (Loudness Unit) and LUFS (Loudness Units relative to full scale). The LU is a relative unit of loudness measurement equivalent to the decibel (dB): 1 LU = 1 dB. LUFS is an absolute unit of loudness measurement equivalent to decibels relative to full scale (dBFS): -1 LUFS = -1 dBFS.

MASTER LOUDNESS

Unless instructed otherwise, I generally aim for competitive loudness when producing a master. The level is normally dictated by genre and style of mix. It is worth noting that depending upon the arrangement and frequency content of the mix, the loudness potential of one mix may differ from that of another mix. Although master loudness is ultimately set by ear, I have established the following loudness ranges for reference purposes:

Uber Loud: -5 LUFS -> -8 LUFS

Loud: -9 LUFS -> -12 LUFS

Dynamic: -13 LUFS -> -16 LUFS

Ultra Dynamic: -17 LUFS -> -20 LUFS

It should be stated that though the loudness measurement is weighted to conform relatively closely to the response of the human ear, masters with more bass content may measure higher than masters with comparable perceived loudness. In addition, masters with more upper midrange content may be perceived as louder than masters with the same loudness measurement.

LOUDNESS NORMALIZATION

Loudness normalization (LN) is a system for adjusting the gain of audio files with the goal of each file being reproduced at the same perceived loudness. In music production, this is known as track normalization. LN has gained favor with some listeners for use when listening to music in shuffle mode or to other dynamic playlists. And, of course, LN has existed in radio broadcast for decades. A number of media players include an LN option. iTunes incorporates Apple's proprietary LN system called Sound Check. Several other media players utilize the open source LN system ReplayGain. A LN system measures the peak level and perceived loudness of an audio file and attempts to adjust the average loudness to a specific target level. The target level for Sound Check (and therefore iTunes and additionally Apple Music) is -16 LUFS, while the ReplayGain 2.0 target level is -18 LUFS.

Although their respective target levels differ, Sound Check and ReplayGain function in much the same manner. Audio files are scanned to calculate the difference between the measured perceived loudness and the target loudness, after which a gain offset value (measured in dB) is assigned to each file. This value is normally stored as metadata in an ID3 tag or in the iTunes Music Library database. It should be noted that MP3 files created with the iTunes Export function will be tagged with Sound Check metadata. When the LN system is enabled, the level of the audio file is adjusted based on the value. Applying a negative gain offset decreases the level. For example, if an audio file is measured at -12 LUFS and played back by a LN system with a target level of -16 LUFS, a gain offset of -4 dB is applied and the file is turned down by 4 dB. Applying a positive gain offset increases the level. However, the outcome of a positive gain offset is not always straightforward. Depending on the crest factor of the audio, the level increase may cause the file to clip if its peak level exceeds 0 dBFS and/or 0 dBTP. By default, Sound Check and ReplayGain handle this situation differently. Sound Check will limit the gain increase to just below the level at which the file would become clipped. ReplayGain, on the other hand, lets the file clip. The solution in this case depends on the media player's implementation of ReplayGain. Some media players include an optional limiter which can prevent the file from clipping (but may color the sound to a certain degree), whereas others have a setting to lower the positive gain offset value according to peak information to prevent clipping (a process similar to the method used by Sound Check).

In addition to Sound Check and ReplayGain enabling track normalization (Track Mode), these LN systems also provide support for album normalization (Album Mode). Album Mode normalizes the average loudness of an entire album (or other collection of tracks) to the target level, while maintaining the level differences between the individual tracks.

Streaming

The same basic LN principles apply to the implementation of loudness normalization on streaming platforms. In fact, Sound Check functions exactly the same on Apple Music as it does in iTunes. While Spotify formerly employed a gain adjusted implementation of ReplayGain 2.0, the current LN system complies with LUFS measurements. When LN is enabled, positive gain is applied to tracks under the default target level, but the gain increase is limited to -1 dBTP. Both Track Mode and Album Mode are available on Apple Music and Spotify. TIDAL only supports album normalization (there is no track normalization) but uses a different method. On TIDAL, the loudest track of each album is normalized to the target level and the other tracks are adjusted in relation to their relative level on the album. Amazon Music, Deezer, Pandora, SoundCloud, and YouTube support track normalization only. The target level for Amazon Music, SoundCloud, Spotify, TIDAL, and YouTube is -14 LUFS. Deezer has a target level of -15 LUFS. Apple Music has a target level of -16 LUFS, while Pandora has a target level of approximately -16 LUFS. Other than Apple Music and Spotify, the only other major platforms to provide positive gain while normalizing are Pandora (in part by allowing clipping) and SoundCloud (through the addition of limiting).

Streaming Platforms

Streaming platforms use a variety of formats and loudness normalization (LN) systems. Listed below are the streaming formats and LN specifications (if applicable) of the major streaming platforms:

Amazon Music | HD | Ultra HD

Streaming Format - Standard: 320 Opus

Streaming Format - HD: 1644 FLAC

Streaming Format - Ultra HD: 2444/2448/2488/2496/24176/24192 FLAC

LN Target Level: -14 LUFS

LN enabled by default: Yes

LN Mode: Track

Positive Gain: N/A

Apple Music

Streaming Format - Standard: 256 AAC (iTunes Plus)

Streaming Format - Lossless: 1644/2444/2448 ALAC

Streaming Format - Hi-Res Lossless: 2488/2496/24176/24192 ALAC

LN Target Level: -16 LUFS

LN enabled by default: No

LN Mode: Track/Album

Positive Gain: Gain

Bandcamp

Streaming Format: 128 MP3

LN Target Level: N/A*

LN enabled by default: N/A

LN Mode: N/A

Positive Gain: N/A

*Bandcamp does not support loudness normalization.

Deezer

Streaming Format - Low: 64 HE-AAC v1

Streaming Format - Standard: 128 MP3

Streaming Format - High: 320 MP3

Streaming Format - Lossless: 1644 FLAC

LN Target Level: -15 LUFS

LN enabled by default: Yes (mandatory)

LN Mode: Track

Positive Gain: N/A

Pandora

Mobile Apps

Streaming Format - Low: 32 HE-AAC v1

Streaming Format - Standard: 64 HE-AAC v1

Streaming Format - High: 192 MP3

LN Target Level: -16 LUFS*

LN enabled by default: Yes (mandatory)

LN Mode: Track

Positive Gain: Gain + clipping

Web Browser

Streaming Format - Standard: 64 HE-AAC v1

Streaming Format - High: 192 MP3

LN Target Level: -16 LUFS*

LN enabled by default: Yes (mandatory)

LN Mode: Track

Positive Gain: Gain + clipping

*Pandora's loudness measurements do not correspond precisely with LUFS measurements. The target level is an approximation.

SoundCloud

Streaming Format - Standard: 64 Opus

Streaming Format - High: 256 AAC

LN Target Level: -14 LUFS

LN enabled by default: Yes (mandatory)

LN Mode: Track

Positive Gain: Gain + limiting

Spotify

Desktop/Mobile/Tablet Apps

Streaming Format - Low: 24 HE-AAC v2

Streaming Format - Normal: 96 Vorbis

Streaming Format - High: 160 Vorbis

Streaming Format - Very High: 320 Vorbis

Streaming Format - HiFi: 1644 FLAC

LN Target Level - Loud: -11 LUFS

LN Target Level - Normal: -14 LUFS (Default)*

LN Target Level - Quiet: -23 LUFS

LN enabled by default: Yes

LN Mode: Track/Album

Positive Gain: Gain/Gain + limiting (Loud LN target level only)

*The target level is adjustable on Spotify Premium only.

Web Player

Streaming Format - Normal: 128 AAC

Streaming Format - High: 256 AAC

LN Target Level: N/A*

LN enabled by default: N/A

LN Mode: N/A

Positive Gain: N/A

*The Spotify web player does not support loudness normalization.

TIDAL

Streaming Format - Standard: 320 AAC

Streaming Format - HiFi: 1644 FLAC

Streaming Format - Master (MQA): 2444/2448/2488/2496/24176/24192/

24352/24384 FLAC

LN Target Level: -14 LUFS

LN enabled by default: Yes

LN Mode: Album

Positive Gain: N/A

YouTube

Streaming Format - MP4: 128 AAC

Streaming Format - WebM: 50/70/160 Opus

LN Target Level: -14 LUFS

LN enabled by default: Yes (mandatory)

LN Mode: Track

Positive Gain: N/A

YouTube Music

Streaming Format - Low: 48 HE-AAC v1

Streaming Format - Normal: 128 AAC

Streaming Format - High: 256 AAC

LN Target Level: N/A*

LN enabled by default: N/A

LN Mode: N/A

Positive Gain: N/A

*YouTube Music does not support loudness normalization.

INPUT

OUTPUT

DXD: Digital eXtreme Definition. A 32-bit (float)/352.8 kHz PCM format adopted by Merging Technologies to perform editing and processing of DSD audio.

32192

32-bit (float)/192 kHz PCM audio.

32176

32-bit (float)/176.4 kHz PCM audio.

3296

32-bit (float)/96 kHz PCM audio.

3288

32-bit (float)/88.2 kHz PCM audio.

3248

32-bit (float)/48 kHz PCM audio.

3244

32-bit (float)/44.1 kHz PCM audio.

24192

24-bit/192 kHz PCM audio.

24176

24-bit/176.4 kHz PCM audio.

2496

24-bit/96 kHz PCM audio.

2488

24-bit/88.2 kHz PCM audio.

2448

24-bit/48 kHz PCM audio.

2444

24-bit/44.1 kHz PCM audio.

1644

16-bit/44.1 kHz PCM audio.

EXPORT

STUDIO MASTER FORMATS

Once all level, editing, and processing decisions have been made, the DAW session is exported as a 32-bit (float)/96 kHz WAV file. This 3296 WAV master is the primary parent master for all of the WAV digital delivery masters and the direct parent master for the 2496 WAV master. After sample rate conversion from 96 kHz to 44.1 kHz, a 32-bit (float)/44.1 kHz WAV file is exported. This 3244 WAV master is the direct parent master for the 2444 and 1644 WAV masters.

PREPARING FOR DIGITAL DELIVERY

Although there may come a time when one universal digital audio master is accepted, the music industry today still very much has one foot in the past. The most common formats for digital delivery remain lossy formats (MP3, AAC, etc.). Lossless formats (FLAC, ALAC, etc.) are available, but they are mostly limited to 16-bit/44.1 kHz files (CD quality).

Currently, there are only a handful of digital distributors (or aggregators) which accept high resolution/high sample rate files as delivery masters. And most mastering engineers are reluctant to rely on a third party to perform bit depth reduction and/or sample rate conversion on masters which are submitted for distribution. There are, though, some mastering engineers, myself included, who are willing to accept the application of dither (ideally flat TPDF) by the digital distributor or digital delivery platform when converting 24-bit/44.1 kHz delivery masters to 16-bit/44.1 kHz files for those platforms limited to 16-bit. The rationale is that the audible result from the one time application of improper dither (i.e., noise shaped) or the absence of dither (i.e., truncated) is relatively benign.

Although not all inclusive, the following major platforms accept high resolution/high sample rate (24-bit/44.1 kHz - 96 kHz or 24-bit/44.1 kHz - 192 kHz) files as delivery masters:

Amazon Music HD - Ultra HD

Bandcamp

iTunes/Apple Music*

Qobuz

Spotify

TIDAL Masters

YouTube/YouTube Music

*Apple Digital Masters, formerly Mastered for iTunes (MFiT), only accepts 24-bit/44.1 kHz - 192 kHz files as delivery masters from select labels/digital distributors utilizing approved mastering studios.

The vast number of formats available today is why Mindtree provides you with all of the necessary delivery masters to accommodate the present state of digital delivery, as well as that of the foreseeable future.