Listening to loudspeakers.

The ear has to be deceived into creating invisible musicians in space, between and beyond the physical x-plane of the loudspeakers so the dispersion of the loudspeaker's energy into the listening room has to be carefully controlled by design. The sound stage should not be tightly keyed to the physical position of the speakers themselves but should hang as a curtain between and beyond them without a sharp beginning or end. First a brief look at recording that sound stage.

Fig. 1 The earliest concept of capturing and replaying a sound stage

Microphones are nothing more than sound sampling devices; they tell us about the sound waves as they pass by their tiny diaphragms, and from that signal, replayed over loudspeakers we deduce the larger recording environment in our mind. Sound waves are in fact nothing more than localised modulation of atmospheric pressure, and you can imagine how weak and diffuse their energy is (compared to atmospheric pressure).

The very earliest recording of a sound field (above) was made in 1881, when a series of between ten and eighty telephone microphones were arranged cross the stage and connected by individual wires to telephone receivers in a nearby hotel. Visitors could select a receiver for each ear and listen to the live sound from the hall. The width of the image would have depended upon which receivers they selected: No.1 (left ear) used with No.10 (right ear) would have given a wide and diffuse field with a hole in the middle; No.5 and No.6 would have painted an intense binocular-like perspective of the centre of the orchestra.

By the 1930's both loudspeakers and the means of making recordings were established thanks to the pioneering work of the USA's Bell Labs. and the UK's Alan Blumlein inventor of the so called 'Blumlein pair' or 'crossed figure-of-eight' ribbon microphone recording technique.

Fig. 2 A typical minimalist modern recording set-up

Just two microphones of cardiod pick-up pattern have been arranged at 90 degrees to each other, some distance back from the performers so as to include all of them in the angle of acceptance.

Accurate stereo reproduction places a performer exclusively 'in' the left loudspeaker if he/she is actually on the extreme left edge of the performing stage picked up solely by the left facing microphone. Conversely, a performer hard right would emanate from the right loudspeaker alone.

Much more challenging is reproducing a performer dead central, picked up equally by both microphones because performers anywhere than at the extremities of the sound stage are spaced on the loudspeaker soundstage entirely in the mind of the listener. The central image is psychoacoustically extremely critical and is called the 'phantom mono image', and should appear ethereally hanging in the air and not at all 'projected' by either loudspeaker: a true curtain of sound. As we will see, the all-important phantom image keys the entire left-right sound stage's depth and width perspective. The correct perceived energy in and around the phantom image is is a design characteristic of Harbeth's life-like, natural sounding stereo.

Our interpretation of a virtual sound stage across the plane of the loudspeakers is due to the directionality of our outer ear and the mental map we individually build of the world we live in. Our auditory system primarily exists to give us accurate localisation of threatening sounds around us: if we hear a twig snapping behind us we instinctively turn and examine the threat. The stereo illusion persuades our auditory system's twenty million years of evolutionary fine tuning into believing that two loudspeakers (or even five or seven) are an real solid soundscape. Few loudspeakers can execute this illusion properly - as one would actually hear at the recording; most 'spice-up' the sound which to our trained ears destroys the subtle illusion.

For Harbeth users, the thrill is in the clarity of sound; that makes for a better deception and rewarding listening experience. But there are many alternative design solutions as we will see...

The ear uses subtle cues in the characteristic sound of loudspeakers to create a mental image of the performers in space, and they can convince the listener that the musicians are spaced left to right and closer or further away.

The 'X-plane' represents the left to right spread of musicians across the sound stage. The 'Y-plane' describes the apparent depth of the musicians from the front of the sound stage.

There is nothing that a speaker designer can (or indeed should) do to change the positions of performers across in the sound stage's x-plane; that is completely fixed and encoded into the recording. However, there are substantial differences in the way that loudspeakers resolve and present the depth perspective in the z-plane. Your apparent position in the Z-plane (depth) is equivalent to viewing the stage through binoculars: you may choose not to use them (the natural, 'Harbeth sound'), you may use them one way round to move your listening position forward (positive Z-plane movement ) or the other way round to recede from the orchestra (negative z-plane movement). In practice, this depth perspective is a function of crossover design and driver directionality.

When we talk of the 'phantom image' we are referring to those sound which appear to be solid and placed between the loudspeakers. Of particular note is the character of the (dead central) phantom image - this is where vocals are often placed.

Fig. 3 Ideal loudspeakers, wide sound stage - musicians spread along the x-plane between the loudspeakers and with appropriate depth (the way the recording was intended to be heard)

Downloadable MP3 audio examples at the bottom of this page.


A Harbeth loudspeaker spreads the sound stage equally and evenly between and beyond the speakers creating the illusion that the performers are playing as you would see and hear them at the recording with 'air' around them. Depending on microphone technique, this approximates to a seat in the 5-15th row in the hall - a combination of direct and reverberant sound and a real sense of depth.

The sound stage illusion is usually enhanced listening with dimmed lighting or in the dark where the sense of hearing is heightened. With a first rate recording and credible loudspeakers, you should be able to pick-out, or localise performers in the front row to the back row, left to right side of the stage. With acoustic music, the sound images will be solid and not drift across the stage or up and down. Some listeners even report a realistic sense of height. Of course, in pop-music and film sound, the producer can steer images around the sound stage to excite the listener.

When used in a small or medium sized room of average reverberation time, Harbeth speakers are balanced to provide this natural perspective no matter how far the listener sits from them. At the top of the design specification of all and every Harbeth speaker is that it must not be fatiguing to listen to hence, by implication it must not have an over-vivid phantom image

Fig. 4 'Recessed' loudspeakers, weak central sound stage - musicians appear to have slipped back-stage

In this situation, the loudspeakers have a characteristic which create a concave image, well behind the physical plane of the speaker themselves and with a recessed phantom image. Sounds at the extremities of the sound field (hard left, hard right) appear to be magnified in importance. The overall presentation can be described as 'distant', 'uninvolving', 'lush' or 'boring' but not especially unpleasant on acoustic performances such a large-scale orchestral recordings in a big auditorium. On more lively music it is impossible to accurately localise performers across in space and vocalists are sometimes described as 'dark'. When listening extremely close to the speakers, almost wearing them as headphones, this perspective gives the impression of cavernous depth, but at a more normal listening distance the listener is clutching for the missing energy and 'air'.

Despite its limitations, this relaxed presentation will work in moderately reverberant acoustics with glass paneling (especially very close to the listener) as the lower midrange energy level reduces the splatter of reflected energy off those surfaces. The traditional 1940-1980 "BBC monitor sound" has been biased towards this relaxed perspective which makes for a fatigue-free, if not especially 'involving' listening with a perceived warm bass end.

Fig. 5 'Pinched', 'pushed', 'shouting', 'colored' - musicians at centre of stage bulge towards you like a searchlight beam

In this design, the phantom image in the middle of the sound stage is pulled out of the plane of the speakers towards the listener. This may be the result of a deliberate design decision and/or from latent colouration in the drivers and/or from peaks in the frequency response. Consequently, as vocals are normally presented centre-stage, the listener's attention is now riveted to the overly-vivid phantom image to the detriment of a smooth left-right spread across the sound stage. One listener describes this effect as "the vocalist almost nibbling my ear", another of a 'cup-like megaphonic quality'. Technically, this speaker presents a "high-Q" sound stage. When used in a contemporary and hence acoustically hard environment (glass walls, TFT screens) the reflections combine to exacerbate the already ringy presence band.

Under blind listening tests and without a frame of reference (a very critical point) listening far from the speakers, this design can sound initially impressive because of the attention it demands from the inexperienced or untutored listener (see page 7-4). An instantaneous A-B switchover between loudspeakers under evaluation is mandatory to ensure that inexperienced listeners are on-guard against selecting loudspeakers exhibiting this balance, which will all too soon prove irritating and unnatural. Evaluated in single-speaker-mono (not as phantom mono over a pair of speakers) in an acoustically dry space, this speaker may even be initially described as 'detailed' and 'articulate', 'dynamic' and 'impressive'. The A-B switchover concept is covered on the next page, Chapter7-3. When this colouration is beyond a certain minima, the serious listener's fight-or-flight response is sensitised; in stereo, the over-vivid phantom image is unsettling, and results in fidgety and stressful listening.

Reports of this type of loudspeaker needing 'an extended burn-in period, maybe hundreds of hours' this have nothing to do with any ageing process in the speaker but of the listener subconsciously reprogramming his fight-or-flight response to adapt to this sound. A tell-tale confirmation of this over-intensity is that as the overall replay volume is reduced, the phantom image barely decreases in strength remaining prominent and coloured even when the full sound stage has substantially collapsed.

This fatiguing design is usually the consequence of design oversights: 'design in mono and it'll be OK when duplicated for stereo'; neglecting speech as a test material through the design process and not having an adequately developed mental memory of the intensity and dispersion characteristics of real, live instruments and voice.

This peaky, although initially compelling presentation is completely unsuitable for use in even moderately reverberant acoustics with glass paneling because the higher perceived midrange/presence energy level splatters off those surfaces. There is nothing that can be adjusted in this design to ameliorate the hard sound - it is fundamentally coloured despite the initial listening promise.

Fig 6 'Pinched' but also 'dark', 'airless', 'recessed', 'furry', 'confused' - musicians trapped in
an acoustic fog

The worst of all sonic worlds - an odd combination of the extremes of contrasting designs 7-2B and 7-2C: the sound stage through dirty rippled glass. Listeners described instruments as 'oddly dark' with no sense of air around them, akin to a low-bitrate MP3, all low-level detail erased. It's impossible to size the acoustic behind and around the performance despite the vivid, pushy phantom image. Analysis suggests problems due to cone materials (the 'darkness'), misjudged crossover integration and uneven dispersion off axis. . The designer may have been subconsciously aware of the lack of 'air' and often attempts to compensate by lifting the high frequency (tweeter) level. This does not disguise the underlying lifelessness. There are many examples of this coloured sound amongst hi-fi and 'studio monitors'

The extreme left/right edges of the sound stage can take on a strange disembodiment, creating an initial illusion of acceptable - even wide - stereo which, to some listeners, counterbalances the other characteristic defects. The dark, overall lackluster cast gives the impression that a blanket has been thrown over the microphones, rather thicker in the mid-left and mid-right positions. This design leaves the listener with a creeping sense of dissatisfaction: the longer he listens, the worse it becomes until he is so presensitised to the colouration that serious listening ceases to be enjoyable or even possible.

This presentation can not be made to work properly in reverberant acoustics with glass paneling as the higher presence energy stands out in relief against the thin midrange energy level.

Owners of these loudspeakers are often caught up in vicious cycle of equipment upgrading to try and recover the missing 'air' and freshness of live sound which finally ends when they become Harbeth RADIAL users.

As illustrated on the previous page, certain colorations that bedevil loudspeakers can be interpreted by the brain/ear as changes in sound stage perspective. If these aberrations are severe, the listeners perspective is compressed to a binocular view, not a contiguous left-right spread.

To forensically evaluate loudspeaker colourations, it is not good enough to rely on the human audio memory. A break in listening of even a few seconds is enough for the 'audio-DNA' to fade. Our ability to sense the environment is at it best when presented with comparatives: two shades of colour, two perfumes, two wines, two ties, two sounds. A remote control relay changeover box is an invaluable research tool for making these instantaneous comparisons between loudspeakers.

The reason for the plethora of available loudspeakers with such a wide range of performance is a reflection of the accuracy and complexity of our internal mental-models of the real sound world. Our mental models and those of our users are more sophisticated and this reflects upon the design.

Fig 7 Remote control A-B switchover box

The white box is a home made relay changeover unit. Inside are many high quality relays that can be remotely controlled to flick the signal path between the banana sockets on the top panel virtually instantaneously. Out of picture to the left is the long remote control lead, which is hand held by the listener and he can operate the silent changeover switch at random.

Although this box can be used to compare two complete stereo pairs of loudspeakers side-by-side, in the configuration shown above, one pair of speakers (NRG2 on stands, not shown but at the end of the thicker white cables) is under evaluation at the crossover design stage. We have developed two 'breadboard' prototype crossover designs, X1 and X2, and although when simulated by computer these have a very similar frequency responses, we anticipated that there would be subtle differences in perceived quality - especially sound stage. The switchover box allowed us to critically evaluate these alternative strategies without relying on our memory, and to select the most natural sounding crossover that gave the widest, most involving sound stage.

A downladable MP3 audio clip that attempts to demonstrate the various loudspeaker sound stage presentations discussed above. Please note that this is a very approximate demonstration using changes in frequency response alone and that it is not really possible to demonstrate the sonic signature of one loudspeaker over another. This sequence is designed to be listened to on headphones or PC-type speakers, not hi-fi speakers.

Fig. 8 Contents of demonstration MP3 file (waveform sequence)

Click here to download an MP3 file demonstrating a sequence of the four types of speaker performance described on page 7-2 following the picture sequence 7-2A, B, C, D and after a tone blip, 7-2A again.

By measurement, the peak level in each example is the same but note how the perceived loudness varies in the examples. In the case of B, the sucked-out midband, the overall sound is gutless and weak, although not particularly unpleasant, unlike C and D which are unpleasant. Carefully note that when D fades to A via the tone burst, correct-sound A usually sounds coloured in comparison with D. The ear is capable of retraining itself so that wrong-sound can become acceptable, but the biological process of adaptation consumes mental energy and comes at a price - listener fatigue.

Great care that has to be applied when evaluating loudspeakers to guard against drawing the wrong conclusion - hence the use of known (but not necessarily perfect) reference speakers and an instantaneous change over-box.

If a loudspeaker evaluation through a blind curtain does not include a known reference loudspeaker (or better still, real live musicians or vocalists) then it is almost certain that under those quasi-scientific test conditions and with no visual cues and a short exposure period, the casual listener would usually select type C with a 'projected' balance or even type D. Types A (and B) with a more accurate but softer, less incisive sound would make less of an impact through the curtain. However, if a known reference is introduced, the situation completely reverses, and C and D would be exposed as coloured.

The moral of this story is to anticipate the psychoacoustics of how the ear works and to insist on scientific methods and procedures. To select loudspeakers for serious use without such checks and balances is very unwise and inevitably leads to user's listening fatigue as a longer exposure to the loudspeakers reveals their true nature.

(Top) The new stereo soundbook Everest/Streicher ISBN 0-8306-3903-9
(middle) Original orchestral photo courtesy of Michael Chang modified by Harbeth
(Bottom) Music, Physics and Engineering Harry F. Olson ISBN 0-486-21769-8 combined with Broadcast Sound technology Michael Talbot-Smith ISBN 0-240-51355-X (all recommended reading)

bject height="0" id="plugin0" style="position:absolute;z-index:1000" type="application/x-dgnria" width="0">