A PRACTICAL SYSTEM FOR THREE-DIMENSIONAL SOUND PROJECTION (Vennonen, Cont'd)

3. AMBISONICS

Over some months in excess of three hundred papers and articles were found dealing with stereo, quadrophony and surround sound. (A database listing and annotating these works is currently being compiled by the author.) Upon digesting this volume of information and research, it became clear that others had preceded us in our general intentions, but had developed quite different systems. Several had even worked with venues like ours [1] [2] [3].

There is now a growing body of theoretical understanding into how we hear three dimensionally in nature, and how techniques like stereo and quadraphonics work (and don't work !). References [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] are a representative and introductory sample. However, this understanding is not without controversy - the quad debates in the 1970's were enough to frighten most audiophiles into being content with stereo. This has led to a considerable ignorance by many of us about more-than-one dimensional sound, not to mention optimal stereo recording and production techniques. It can be said with hindsight that quad was doomed because it was an immature approach resting on faulty assumptions, driven by the marketing department. Once this realization occurred to a small number of researchers in the early 1970's, there arose a desire to come up with a better format, more in tune with emerging research. The increase in understanding of our localization abilities caused two forks of development that we still have today. One deals with binaural stereo and head related transfer functions, not adaptable to loudspeakers feeding large numbers of people in a concert space, but very good for personal VR environments [14]. The other approach arose out of arguments about the optimum way to encode 360 degree surround information into two or three channels, and was the work of the Ambisonics team in the UK and the UMX researchers in the US. (As a sideline, it is interesting to note that these developers are now working on techniques for hiding surround information in a stereo-compatible digital bitstream, digital audio broadcasting and increasing the perceived resolution of 16 bit audio to 20 bits.)

On first reading, the Ambisonics proposals were extremely interesting. A 3D sound field could be encoded into four audio channels and then decoded with the appropriate circuits to suit any playback space. A microphone is available that registers and encodes 3D sound in the same format, and this could then be manipulated in post-production to yield the desired spatial quality for stereo work. It became obvious that there was a considerable theoretical basis to this system, and that it could be adapted to the dome environment, as well as other speaker setups. It is a tremendous advantage to be able to encode a piece just once in a standard format, and know it will spatially work anywhere else given the right decoding parameters. References [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] are useful reading.

The most general description possible would be to say that the system offers a psychoacoustically optimized way of encoding an infinite number of sound directions into a limited number of channels, and then decoding them to a given loudspeaker layout on playback. A theory of the human sound localization process has been refined and Ambisonics takes this into account, dealing with low and high frequencies in different ways using phase and amplitude.

Current Ambisonics deals with "zero and first-order directional components" of a sound field in the transduction process. Higher order sampling of the sound field would use the equivalent of more highly directional microphone patterns and consequently require nine or more channels.

First order directivity patterns, like cardiod or figure of eight for instance, are characteristic of nearly all directional microphones and correspond to particle velocity caused by transverse waves in a medium. An omnidirectional microphone responds only to zero order components, caused by the pressure components of sound waves. Ambisonics could be described as a system that simulates one omnidirectional microphone and three figure of eight directional microphones, aligned along the X, Y and Z axes.

On playback this has the consequence that an encoded sound source is decoded not with one or two speakers, but many. In very approximate terms, the volume of each speaker corresponds to the its distance from the desired phantom sound image. One interesting result of the encoding outlined above, combined with the different ways low and high frequency material is dealt within the decoding, is that some speakers will reproduce low frequencies out of phase relative to other speakers. One can see that this system is much more sophisticated than stereo or standard quad, which are using the one-dimensional paradigm of pair-wise intensity panning to place sounds.

However, reproduction of point sources is not ideal, because of the condensation of all spatial information into a very few channels. For instance, a source recorded at 45 degrees azimuth will on decoding appear at a speaker at the 45 degree point, and to a lesser extent at nearby speakers. In a reverberant playback setting the amplitudes and phases arriving at the ears are not what is intended, so the image will broaden. Although this may be interpreted as a loss of separation by some, it can be shown that in a non-reverberant space the acoustic signals at the ears produced by Ambisonics are quite similar to those produced by a real source [19]. The Ambisonic approach, even in its horizontal form, easily surpasses discrete quad in image definition, especially at the back and sides.

There are various sub-formats within Ambisonics, corresponding to various places in the signal path from microphone to loudspeakers. They are called A, B, C and D formats. The Soundfield microphone [20] [21] [22] is a tetrahedral array of four near-coincident sub-cardioid condenser capsules, which after initial processing in the microphone produce the A-format signal. This consists of left-front-up, right-back-up, right-front-down and left-back-down outputs. These signals may then be processed by addition and subtraction in the accompanying box into B-format. This is four signals consisting of an omnidirectional pressure component W, and three figure-of-eight velocity pattens X, Y and Z corresponding to the three axes. For stereo work, any coincident stereo microphone patterns from omni to hypercardioid, at any angle may be synthesized from the B-format signals by more sum-and-difference matrixing.

The B-format is a very robust professional production format, containing all first-order directional information. Pressure and velocity components are kept separate, allowing considerable production flexibility and maximising the spatial resolution possible with first order coding. Besides direct recording, it can also be generated by electronic or computational means by multiplying the four values W, X, Y and Z given below against the incoming audio [32]:

To simulate a source on the circumference of a circle centred on the listener:

for å being the anticlockwise angle from front

and ß the elevation,

W = 0.707

X = coså cosß

Y = sinå cosß

Z = sinß

to simulate movement within the circle:

W = 1- 0.293(X**2 + Y**2 + Z**2)

Further manipulation of the B-format signals to simulate distance, etc. and global effects like soundfield rotation, tumbling etc. can be achieved by multiplying them by appropriate coefficients.

For encoding Ambisonic signals into a form more suitable to broadcasting or mass reproduction, there is a definition called C-format. It is possible with optimised spatial information loss to encode into just two channels 360 degrees of horizontal surround sound, using manipulation of phase. This is often termed Ambisonic UHJ [17], and is used currently by Nimbus UK and others for CD releases. Alternatively, for FM broadcasting [15], [23] it is possible to add a third channel in addition to the usual L+R and L-R components, which allows for greater directional fidelity.

Finally, the decoder at the end of the Ambisonic chain turns B or C format signals into D-format, which is adapted for the speaker layout used. For instance, in the 1970's this was often just the standard (albeit imperfect) quad signals LF, LB, RF, and RB. In our case, it is the sixteen signals sent to the speakers at various angles and elevations.

It was stated earlier that is psycho acoustically optimised - this means that sound is reproduced in a way that corresponds closely to how we actually localise sounds in nature. [8] is the classic introduction to this topic. The optimisation occurs at the decoder by the use of shelf filters consisting of 90 degree all-pass filters. The effect is to smoothly alter the gain of W relative to X, Y and Z at frequencies above.

The reason for this is that we have different hearing mechanisms below and above, interaural time difference (ITD) and interaural level difference (ILD) respectively. The frequencies in between are dealt with (not very well) by both mechanisms. In addition, a third mechanism comes into play above 4kHz, called pinna cues, which relies on the spectral filtering action of the outer ear at different angles and elevations. Furthermore, slight head movements are used to resolve conflicts between the former mechanisms and aid low frequency localization. Michael Gerzon provides the classic reference on this topic[33].

A practical loudspeaker based system must generate the correct omnidirectional pressure and (vector) velocity cues at the ears. It turns out that this can be done by controlling the proportions of pressure W and velocity components X, Y and Z above and below 700Hz. The benefit is reduction of sensations of phasiness and minimisation of image shifts with head movements. The proportions need to be different for horizontal or three dimensional listening - see the table below [24].

HORIZONTAL3D
W X,Y W X,Y,Z
LF GAIN(dB) 0 0 0 1.76
HF GAIN(dB) 1.76 -1.25 3 0

A secondary optimization is loudspeaker distance compensation, which is necessary to reduce low frequency phasiness. This is because the wavefront at speaker distances of several metres or less is spherical rather than planar and can cause localization errors of 15 to 30 degrees. The cure is to reduce the velocity components X, Y and Z by -3dB at 20 Hz for 2.7 m distance using simple RC filters. For larger distances, the filters are switched out of the signal path.


[NEXT] [PREVIOUS] [INDEX]
Return to Ambisonics Work or The Australian Centre for the Arts and Technology