Announcement

Collapse

HUG - here for all audio enthusiasts

Since its inception ten years ago, the Harbeth User Group's ambition has been to create a lasting knowledge archive. Knowledge is based on facts and observations. Knowledge is timeless. Knowledge is human independent and replicatable. However, we live in new world where thanks to social media, 'facts' have become flexible and personal. HUG operates in that real world.

HUG has two approaches to contributor's Posts. If you have, like us, a scientific mind and are curious about how the ear works, how it can lead us to make the right - and wrong - decisions, and about the technical ins and outs of audio equipment, how it's designed and what choices the designer makes, then the factual area of HUG is for you. The objective methods of comparing audio equipment under controlled conditions has been thoroughly examined here on HUG and elsewhere and can be easily understood and tried with negligible technical knowledge.

Alternatively, if you just like chatting about audio and subjectivity rules for you, then the Subjective Soundings sub-forum is you. If upon examination we think that Posts are better suited to one sub-forum than than the other, they will be redirected during Moderation, which is applied throughout the site.

Questions and Posts about, for example, 'does amplifier A sounds better than amplifier B' or 'which speaker stands or cables are best' are suitable for the Subjective Soundings area.

The Moderators' decision is final in all matters regarding what appears here. That said, very few Posts are rejected. HUG Moderation individually spell and layout checks Posts for clarity but due to the workload, Posts in the Subjective Soundings area, from Oct. 2016 will not be. We regret that but we are unable to accept Posts that present what we consider to be free advertising for products that Harbeth does not make.

That's it! Enjoy!

{Updated Nov. 2016A}
See more
See less

MP3 and audio data compression

Collapse
This topic is closed.
X
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Gobbledeook quotation - avoid!

    Originally posted by STHLS5 View Post
    ...In Wiki, it was stated "(T)he compression works by reducing accuracy of certain parts of sound that are deemed beyond the auditory resolution ability of most people...
    This quote, as I mentioned a few posts earlier, is gobbledegook. Completely meaningless to me. As I said it cannot be an accurate quote from someone who works in the field as a scientist. So it has no place here being quoted again.

    Perhaps you can decode it for us? I can't. It must means something to you chaps because it's been quoted twice in this serious thread.
    Alan A. Shaw
    Designer, owner
    Harbeth Audio UK

    Comment


    • #17
      Just how can MP3 discard so much information in the source audio file?

      Originally posted by STHLS5 View Post
      ...When you do a mp3 conversion it's not only the top end frequency that goes (although a top cut-off of 16 or 17kHz is not significant for most adults), but frequencies well down in the audio range are lost or attenuated, the effect of which is some level of distortion. ...
      Forget about the contraction of the high frequency range. It's a bit of a red herring i.e. its not a fundamental part of the success of MP3. That in itself wouldn't be patentable. And MP3 technology is patented to the hilt. So, in simple language that we can all understand, what is the answer to my question about how the MP3 encoder can throw-away so much of the music and still it remains a high quality audio file? How could we demonstrate that simply and easily with just household goods? (I've already planned a video demo but I want you to work for your supper).

      I'm always conscious that most of our members here are not technical people, so whilst quoting published work is valuable to those with a technical mind, to the majority, it just adds confusion. So let's try and work this through using everyday language please. Ok?
      Alan A. Shaw
      Designer, owner
      Harbeth Audio UK

      Comment


      • #18
        HF response of Mp3

        Forget about the contraction of the high frequency range. It's a bit of a red herring i.e. its not a fundamental part of the success of MP3.
        I just finished reading an article by K Brandenburg of Fraunhofer Institute published in EBU technical review June 2000 where he said "..it is reasonable technology to limit the frequency response of MP3 encoder to 16kHz.".

        MP3 did limit the upper range of frequencies, didn't it?

        ST

        {Moderator's comment: Alan said this is not a fundamental issue. He said: "Forget about the contraction of the high frequency range............". Just saying that it's reasonable to limit HF does not mean they actually did it.}

        Comment


        • #19
          Originally posted by A.S. View Post
          This quote, as I mentioned a few posts earlier, is gobbledegook. Completely meaningless to me. As I said it cannot be an accurate quote from someone who works in the field as a scientist. So it has no place here being quoted again.

          Perhaps you can decode it for us? I can't. It must means something to you chaps because it's been quoted twice in this serious thread.
          Being I am the one who posted it please permit me to be the one to 'decode' it.

          Only the term "perceptual coding" is in the professional paper. The professional paper does not define perceptual coding as being what Wikipedia says it is.

          The Wiki quote misleads the reader because that Citation 13 [the professional paper] only serves as a reference for the term "perceptual coding" and not any of the text that precedes the term: "The compression works by reducing accuracy of certain parts of sound that are deemed beyond the auditory resolution ability of most people". This method is commonly referred to as perceptual coding.[13]

          This link will download a pdf of the entire scholarly paper Signal compression based on models of human perception ...
          A screen shot of the Abstract is attached.
          Attached Files

          Comment


          • #20
            And the answer is ...

            ... all right, the HF issue is peripheral.

            But if the right answer has already been presented, albeit non-technically (i.e. masking, psychoacoustic coding), then I think not addressing it just serves to confuse. Once you raise a question, I think you have to provide the answer within a reasonable time period, or people will just lose the thread.

            Comment


            • #21
              But if the right answer has already been presented, albeit non-technically (i.e. masking, psychoacoustic coding
              I'd say that was adeeply technical contribution; about as deep in the subject of sound as one can possibly go.

              "Masking, Psychoacoustic coding" etc. etc... With one or two exceptions, I wonder who reading this - members and non-members alike - have the remotest idea what that is all about. What sets this group apart is the demystification of ideas - at least, that's how I see it. It's all very well quoting from erudite sources - but that's not what we're here for. Leave that to the real boffins. We need earthy, basic ideas. Because only those stick in the mind.

              OK, I asked for some suggestions about how we could demonstrate this concept. Assuming that one knows what "masking, psychoacoustic coding" is all about then a simple demonstration would be worth a million words and pictures. Ideas? As I said, drawing not on lab equipment but everyday life.

              Comment


              • #22
                OK, I see what you're after now

                If you're looking for a non-technical analogy that would be reasonably comprehensible to most people, one idea that occurs to me is to compare various bitrates of audio compression to another kind of compression that many are familiar with, i.e. JPEG compression in digital photography, based on the number of megapixels available to encode an image.

                I think it's correct to say that, at least at a modest image size, the eye is unlikely to detect much difference between, say, an image at 5 megapixels and another at 10 (camera and lens quality being otherwise equal), though one image contains twice as much data.

                Would this kind of visual analogy be useful, perhaps? The benefit being that the point would be made clear by people's ability to compare the two images side by side, which can't be done with sound. (If so, I can't volunteer to put it together as I have a busy week ahead, but if you think the idea has merit maybe someone else could?)

                Comment


                • #23
                  Yes, exactly that sort of thing. We tried that some time ago here.

                  But - this masking business is about sound. Wouldn't it be more useful to illustrate the point to use a sonic example? If we're quite clear in our own minds what the previously quoted 'masking' is all about then surely we can dream up an example or two? Eh?

                  The internet is a curse in that it's all too easy to reel off other peoples quotes without really comprehending what is being quoted. You as a lawyer will be familiar with the low technical knowledge of any court jury (a cross-section of society). What I'm asking for is to take the subject and make it accessible to the ordinary juror with apt, succinct, relevant examples. I can see a way but I'm exhausted having spent the whole afternoon on making another TechTalk video. It shouldn't be for me to educate! I'm just a humble speaker designer not a tutor!
                  Alan A. Shaw
                  Designer, owner
                  Harbeth Audio UK

                  Comment


                  • #24
                    A sonic example of 'psychoacoustic masking' ...

                    Okay, I'll take a stab.

                    Imagine you're sitting in your living room watching TV at a moderate volume. It's raining steadily outside, and you can hear the rain if you focus on it, but the TV is louder. Suddenly, there's an enormous thunderclap outside your window, that lasts 3-4 seconds.

                    With ordinary digital recording (such as CD's WAV file format), all three sonic events would be registered and encoded (i.e. turned into 1s and 0s by the digital recording process). However, the way the ear works is that while the sound of the very loud thunderclap is happening, you will not be hearing the soft steady sound of the rain, and if the thunder's loud enough, for the 3-4 seconds it's happening, you probably won't hear your TV either. So all the resolution and file size of a recording system devoted to capturing rain + TV + thunder is in a sense "wasted", because while the loud thunder is happening, that's all your ears are going to pick up.

                    So what MP3 does (Apple's AAC system likewise) is, as I understand it, is to continually analyze the relative levels of various sounds that occur together in time, and if the lower-level sounds are below a certain threshold, it will "decide" that those low-level sounds are unnecessary because they will be "masked" by the louder sound the way the thunder masks the rain, and the TV as well. So the MP3 system won't encode (it will completely ignore) sounds below a certain loudness, provided that a sufficiently loud masking sound exists at the same time. This reduces file size, while creating very little if any audible degradation.

                    Obviously, at some point the data reduction does become audible, and the threshold for audibility will depend on such things as the quality of the reproduction system, level of ambient noise in the environment, and so on. On iTunes, which is what I'm most familiar with, I found the old standard of 128 kbps subjectively wasn''t good enough because too much audio information had been ignored by the MP3 coder to generate a small file size. They've since moved to 256 kbps (AAC), which is a lot closer to CD standard, with a bigger file size and certainly good enough for background listening (though I encode my own CDs in Lossless). But for fun, you can encode as low as 16 kbps, which means you're discarding between 98 and 99% of the original data. That's certainly audible, and I would never willingly encode music at that very low bitrate, but even then, what's surprising to me is how identifiable the basic elements of the music still are.

                    Comment


                    • #25
                      Brilliant explanation of 'psychoacoustic masking'

                      Originally posted by EricW View Post
                      ...Imagine you're sitting in your living room watching TV at a moderate volume. It's raining steadily outside, and you can hear the rain if you focus on it, but the TV is louder. Suddenly, there's an enormous thunderclap outside your window, that lasts 3-4 seconds...
                      Thanks for taking the trouble to reply with such a concise explanation of the phenomena of 'psychoacoustic masking'. Good work. That's just as I understand it to be. There are a couple of points I'd like to make:
                      • Isn't it much better when we convert what we read and discover into our own simple words rather than quote from others directly? Ideas and concepts are better fixed in our minds when they are expressed in simple ordinary language, preferably with some visual association - the rain, the TV the thunder all do that admirably and unforgettably.

                      • If we write here as you did as if you were trying to win-over a jury of ordinary non-technical people, a few simple words and images can have more impact than an encyclopaedia of erudite knowledge. Can we try and make this our 'style sheet' for technical discussion?

                      OK, assuming that your explanation is now absorbed, we can move on to a vivid demonstration. OK, I'll make it, but please give me a day or two.

                      ----------
                      P.S. One important point: the discovery that human ears are susceptible to 'masking' cannot be assumed to be universal for all living creatures with ears. It may not apply to your cat or dog. It is a curious by-product of the evolution of our ears and brains. As with many patent originators based on the observation observe of a characteristic of human behaviour or life, the MP3 patent holders seized upon a business opportunity by finding a way to use the oddities of the human hearing system to their advantage to make money. Similarly, photographic pioneer George Eastman (Kodak) used his observation that the human eye was most sensitive to the colour yellow to trade mark their corporate logo. But remember, human yellow sensitivity may well not apply to other animals! We're dealing here in psychoacoustics with quirks of the human hearing system not necessarily hearing in all species.
                      Alan A. Shaw
                      Designer, owner
                      Harbeth Audio UK

                      Comment


                      • #26
                        Opposite of 'masking'?

                        The opposite of "masking" must be "focussing".

                        I have read that a mother will, even in very difficult circumstances, hear the sound of her infant crying - even at modest volumes, it can wake her from a deep sleep. Is this principle used? I wonder if, when we hear a very compelling melody, we pay much less attention to the supporting background sound the accompaniment?

                        Comment


                        • #27
                          Perception, the brain and beliefs

                          Having pondered this subject for a while (mainly in the middle of the night while waiting to see if our youngest really has gone back to sleep) some thoughts occur:

                          Even in the most straightforward mic to tape recording set-up, the sound information that ends up on tape is unlikely to be the sound information that we 'heard'. Leaving aside the physical aspects of shape of head and ears and so on, the simple fact that we are processing what we hear and the mic isn't means that the information is different. I'd not listened to any auditory illusions before, but things based on the Shepard Scale here, are just as intriguing as the optical illusions that I am more familiar with.

                          Having come to terms with the fact that we hear what we think we hear in the same way that read what we think we read - it follows that our brains are doing a fair bit of filling in; we don't need all the letters in a word to read the word and, apparently, we don't need all the harmonics, or even the fundamental, to hear a note - the brain will fill in tones that it perceives to be missing. What the brain does seem to look for though, are patterns and order and continuity, we want things to look and sound 'right' - based on culture and learning and whatever else has gone into forming our own brains.

                          Listening to interrupted speech, bad mobile phone reception for example, aside from being inconvenient also seems (to me) to be 'fatiguing' to the senses. Similarly talking and listening over background noise is ultimately rather tiring; I worked for many years in a shop department that had an escalator running through it, though we weren't consciously aware of raising our voices or straining to hear, the sense of relief when it was turned off at the end of the day and conversation got back to normal was always noticeable.

                          VINYL REPLAY:

                          An issue I have come across with vinyl replay is that some set-ups do a much better job of allowing me to hear the surface noise as something separate from the music and thereby more readily ignore the noise. When the noise and music are less differentiated replay is less satisfactory. I assume that this is down to the brain having enough information to identify that the noise is just that, and therefore de-emphasise it. That extra information is not something I would be conscious of hearing though.

                          MP3:

                          To get back to mp3 and leaving out information then, I think I grasp the various forms of masking that go on and in a music recording we are talking about information that has been picked up by a microphone but that we would not have consciously heard - it would have been masked for one reason or another. This doesn't actually mean that our ears didn't pick it up does it? just that our perception didn't make it known to us. I'm assuming that we have a notion of 'sound permanence' in a similar way to 'object permanence' - it's only the very young who believe that when we hide behind a book we have actually disappeared. So if I could hear the TV before the thunder clap and hear it after the thunder clap then my perception would be that it carried on through it as well. I don't know how much use mp3 encoding makes of the brains ability to fill in the gaps, to hear sound that isn't actually there, to hear what it thinks it should hear - but my guess is that it plays a part.

                          Taking all of this together it seems that what allows lossy files to work is partly predicated on getting our brains to work harder; it's not just putting the manuscript in a smaller box but also leaving out letters from some of the words because we can still read them anyway. The fact that a high bitrate mp3 is all but indistinguishable from a wav must mean that either the extra information really was superfluous, or that our brains don't miss it because they can join the dots for themselves mustn't it?

                          An encoding format which makes use of the brain's strengths (and weaknesses for that matter) is therefore 'shifting the workload' isn't it? By reducing the amount of information we are given (whilst not ignoring the fact that we can be given too much information at times) is it not increasing the rate at which the brain tires? I am of course assuming that the act of perception is in itself tiring, I find it to be so - but that doesn't necessarily mean that it actually is.

                          A BROADCAST MONITOR SPEAKER:

                          Why should this matter to a Harbeth user? One of the prerequisites for a broadcast monitor must be that it can be used all day without causing fatigue to the user; that the information it presents must be put across in such a way that the listener has to make as little effort as possible to hear it, even at / especially at low volume and with other noise around. (I hope I've got that right?)

                          Comment


                          • #28
                            Data reduction

                            Originally posted by weaver View Post
                            ...An encoding format which makes use of the brain's strengths (and weaknesses for that matter) is therefore 'shifting the workload' isn't it? By reducing the amount of information we are given (whilst not ignoring the fact that we can be given too much information at times) is it not increasing the rate at which the brain tires? ...
                            Your argument that data reduction (i.e. converting a .wav file to an MP3 file) discards much (actually most) of what the microphone collects is definitely true. And easy to prove by watching the byte count of a digitised (wav) audio file diminish as it is converted to MP3 format with ever greater compression. Even a very high quality MP3 throws away most of the fine detail on the file. But as you note, it is virtually - or I'd say actually - impossible (except under the most incredibly contrived controlled conditions with young trained ears which preculdes all of us here) to detect, say, a 256kb MP3 versus the original WAV file. That's my experience anyway and I'd put money on it.

                            But as to your hypothesis that the brain is having to work overtime to 'make-up' or patch-in the missing data lost during reduction, no that is not the case. Whatever mysterious masking process is going on in (as I recall it) the hair cells of the ear that wiggle in response to impinging sound waves, it is they which mechanically define the way masking works, not some post-processing function in the brain. You can, crudly, think of the hair cells as a long row of green bottles sitting on a wall. The sound wave tickles the first bottle and passes along the chain, the bottles bending with the wave as it passes. As the bottles are quite fat, they can't discriminate very fine differences in tone of the sound wave, and this is the core of the making concept. So the brain simply isn't receiving the fine-detail from the hair cells as sensory input. Ok, I could look this up - the man who did much of the research was Zwicker - but I'll leave that to you. And Zwicker's work is the core of masking and hence, the MP3 system.

                            Incidentally, when DCC appeared (and I still use it - 256kb, 4:1 encoding, completely transparent) I approached Philips and asked them if I could somehow incorperate their compression technology in our active speakers. My concept was that if we were erasing and not feeding the speaker with all the fine detail that was masked in the ear anyway, this would reduce the strain on the speaker, distortion would diminish and the result would be a cleaner sound entirely due to data reduction. I still believe that is a valid argument. The easier we make the loudspeaker's task, surely the less distortion it will produce. And let's not forget, speakers produce oodles of distortion. Even good ones!

                            Of course, data reduction is underpinned by intensive research (almost certainly the most heavily studied aspect of audio, ever) and it has to make assumptions based on the typical human ear plus a margin for error. It is entirely possible that there are young (that's mandated) listeners who can hear things we can't. But as they are not socio-demographically the consumers of quality audio I'd say - and the MP3 people would definitely say - that validates their MP3 masking concept.
                            Alan A. Shaw
                            Designer, owner
                            Harbeth Audio UK

                            Comment


                            • #29
                              Spectral tonal components

                              Thank you Alan.

                              I think I had perhaps been looking at this issue from the wrong end. I had assumed that mp3 relied (in part) on the brain making up for what had been left out; having read a little more this morning it seems that it is more to do with the mp3 sweeping it's mess under a rug such that the brain doesn't notice (or more technically "to minimize the audibility of the quantization noise.")

                              I came across an article originally from the Journal of The Institute of Electrical and Electronics Engineers (from which the quote above is taken), much of which is very informative and rather more of which went straight over my head. But I would recommend that anyone interested reads it, this version is easier to read but omits the diagrams, this version is complete but a little less easy to navigate on screen.

                              I found passages such as this:

                              (to) Separate spectral values into tonal and non-tonal components. Both models identify and separate the tonal and noise-like components of the audio signal because the masking abilities of the two types of signal are different.
                              from the section on 'The psychoacoustic model' to be of interest in that the algorithm is effectively making decisions about what is noise and what is music (as I understand it).

                              Comment


                              • #30
                                How we think we hear and how we actually hear are very different ....

                                To quote from your linked paper ...

                                The MPEG/audio committee conducted extensive subjective listening tests during the development of the standard. The tests showed that even with a 6-to-1 compression ratio (stereo, 16 bits/sample, audio sampled at 48 kHz compressed to 256 kbits/sec) and under optimal listening conditions, expert listeners were unable to distinguish between coded and original audio clips with statistical significance. Furthermore, these clips were specially chosen because they are difficult to compress.
                                I am highly sceptical about the talk surrounding cables etc. because, quite simply, anyone who has spent an afternoon or two just skimming the vast library of research papers covering our hearing system would have to conclude we just do not have the auditory acuity we think we have. And following that, if we are not actually hearing what we think we are hearing there is limitless potential for self delusion.

                                Empirical results also show that the human auditory system has a limited, frequency dependent, resolution. This frequency dependency can be expressed in terms of critical band widths which are less than 100 Hz for the lowest audible frequencies and more than 4 kHz at the highest. The human auditory system blurs the various signal components within a critical band although this system's frequency selectivity is much finer than a critical band.
                                Considering the second quote: we sense high frequencies in 'blocks' or bands that are up to 4000Hz wide. Yes, that's right: a whopping 4kHz wide. So, at the top end of the scale, 16-20kHz is sensed in one wide spectral block.

                                When listening to so many modern loudspeakers I do wonder if their designers have even the vaguest notion about the ear. But they can't because the sound is so damned fatiguing, which in my book, dooms the design to the scrap heap. The natural, un-amplified human voice box simply cannot sound fatiguing because it is crafted from soft, warm, nourished, springy living tissue. That means it is perfectly damped. So if a loudspeaker reproduces voice with a degree of hardness (many do) it says to the listener's subconscious 'that is not a real voice because real voices don't sound like that .... so what is it? A threat?' And being on-edge in anticipation is the definition of listening fatigue.

                                Simply: if you want to separate folk from their cash, you have to know how to take full advantage of their auditory fallibility.
                                Alan A. Shaw
                                Designer, owner
                                Harbeth Audio UK

                                Comment

                                Working...
                                X