Binaural Audio - SOUND SCAPE project

This Text is an excrept of a Research Document that is downloadable from the following link:

An example of the synthesized audio can be heard (with headphones) here:

[soundcloud url=”http://api.soundcloud.com/tracks/53061340″ params=”” width=” 100%” height=”166″ iframe=”true” /]

Binaural Audio techniques use three major factors to synthesize spatialized sounds, that are later listened through headsets.

INTERAURAL TIME DIFFERENCE (ITD)

Air pressure travels across the air and takes some time to arrive at each ear. This time is given by the speed of sound and the distance sound has to travel to either ear. For instance if the sound is on the right hand side of the listener, the sound pressure will arrive first at the right ear and after a few moments it will also reach the left ear; sound had to travel around the head. Thus a dime or phase difference is accounted as a significant cue for the brain to know where the sound is coming from.

It is important to notice that these ITDs are useful when synthesizing sound only to a certain degree, as different frequencies´ wavelengths yield to alias interpretation problems; frequencies that have a wavelenght that is equal or larger than the head´s diameter are confusing.

Figure 4 – Modelled head with parameters to measure time difference (12).

ITD being an inefficient cue for determining the sound localization of a source is no longer a problem when a broadband and/or non periodic sound is played to the listener because it carries more information that can be used to describe where de source is on space.

INTERAURAL LEVEL DIFFERENCE (ILD)

Altogether ITDs, ILDs are vital to locate a given sound source in space, because it is a cue which tells how far an object is or which ear is closer to that source. In opposition to the cue treated above, ILD is more efficient at higher frequencies, as their loudness is altered by the shadowing effect of the head. In other words, when a sound has a low frequency component, the air pressure is well distributed around the head, because the head is smaller or shorter than the sound’s wavelength, whereas the head interferes with high frequency sound wavelengths and their movement towards the farthermost ear of the listener, as depicted on the next figure.

Figure 5 – Waveforms of low and high frequencies, and shadowing effect .

So for pure tones at low frequencies it is hard to tell where the sound is coming from, as the level difference is quite small, whereas above the threshold in which the wavelength starts to suffer from the size of the head, this shadowing effect becomes more evident, causing more notable level differences between ears.

These two interaural differences working together give us very useful information about the distance, and origin of the sound source we are listening to, but still when synthesizing 3D audio they are not sufficient to create an appropriate virtual auditory illusion; it takes more modifications on the sound as stated in the next section.

HEAD RELATED TRANSFER FUNCTION (HRTF)

If only ITD and ILD are used to synthesize a sound a lateralization rather than a virtual spatialization is obtained, as the sound perceived through headsets appears to move just from one ear to the other within the listener’s head (12). This means that there is still more to know in order to create a credible auditory illusion. This is accomplished by the introduction of the filtering and reverberant effect of the listener’s physiognomy on the sound that arrives at either ear.

At this point it can be thought of a frequency response effect of the body on the sound, that depends on the direction the sound is coming from. So a frequency domain filter fits in here to simulate the effect of this physiognomy. These types of filters are called Head Related Transfer Functions (HRTFs) and they contain information of how frequencies approaching either ear on this type of system are affected and suppressed/busted. HRTF’s are defined as

HRTF = output(recorded sound at binaural microphones) / input (original testing sound)

a ratio that shows the proportion of the input and output of the system.

12. STERN, R. M., WANG, DeL. and BROWN, G. Binaural Sound Localization. [book auth.] DeL. WANG and G. BROWN. Computational Auditory Scene Analysis. New York : Wiley/IEEE Press. , 2006