Can I play and rip a SACD/DVD-A disc

Garrycs · Mar 27, 2006

Hi all

Sorry if this against board rules
But I have the disc "Where the humans eat by Willy Mason" and it says on the disc that i need a CD/DVD-V/DVD-A/SACD player in order to play it. On the disc there is a player.exe that i don't want to install and it may be the only way to play the disc.
As i want to play the music on my ipod, how do i transfer the files by ripping. I can probably record the files the old fashioned way by playing and recording the disc which will take 40 minutes, but that's not on.
Can anyone help
Garry

djscoop · Mar 27, 2006

SACD discs cannot be read or ripped by computer optical drives.

But as long as it doesn't have any knarly copy protection, you should be able to rip the CD-audio portion of the disc. We here at aD recommend EAC (exact audio copy)...its the best ripper there is. You can find a guide on how to setup and use EAC in my sig below.

Garrycs · Mar 29, 2006

HI

Thanks for the suggestion, but all the files show up as data files and not audio files. So it looks like the files can't be ripped by EAC.
I read years ago that if a disc has 2 sections music/data, that you can put a line with a marker or a piece of tape across the data line and it only recognises the music. I did get this to work in the past on 2 cd's but i wouldn't like to try it on my own pc.

Anyone else got suggestions.

wilkes · Mar 31, 2006

Firstly, to prevent anything automatically running on disc insert, all you MUST do is disable the autorun functionality of your drive - and this is a good idea regardless, as there are so many discs out there that will install this crap even if you tell them not to.

Bad news is that you do NOT have any rights at all to rip these tracks for iPod use.

diabolos · Apr 11, 2006

So how do you use the SACD data tracks? Do you have to have a DSD compatibale player or plug-in? Or where you talking about the CD layer?

How to go around MediaMax...
http://forums.afterdawn.com/thread_view.cfm/233874

How to disable "Auto-Run"...
http://www.annoyances.org/exec/show/article03-018

Ced

wilkes · Apr 11, 2006

There is no such thing as a software SACD player, or a computer drive that will read/Play them either.

All you can possibly do is extract the Red Book layer - and you don't want to do that as it will be smashed seven ways to sunday in an attempt to make the DSD layer sound better.

The only way to play the SACD layer is in an SACD player.
The Red Book layer should play in all players, although with the hybrid SACD there is a well-known problem with cracks from the spindle hole, and the more you play the disc the worse this will get.
Sony have no plans to remedy the fault.

djscoop · Apr 11, 2006

ah c'mon wilkes, one of these days you'll come around and start to love Sony. LOL

diabolos · Apr 11, 2006

Thats what I thought.

...and you don't want to do that as it will be smashed seven ways to sunday in an attempt to make the DSD layer sound better.
Click to expand...

That would be cheating... My Musik Soul Star CD/SACD Hybrid sounds great either way (too me). The surround sound track is the main plus.

Ced

wilkes · Apr 12, 2006

Don't get me wrong - I have heard some great sounding SACD discs.
In particular, Roxy Music''s Avalon & Bryan Ferry's Boys & Girls are examples of how it can be done.
Trouble for me is that there are far, far more bad discs - and DSD/SACD has serious issues.
Ultrasonic noise is the biggie for me - all there is above 23KHz is noise, and lots of it.
But this is not a thread bashing SACD, SO I will stop now.

djscoop · Apr 12, 2006

so you still prefer the 24 bit dvd audio structure as opposed to the 1 bit sacd format?

diabolos · Apr 12, 2006

How does 1 Bit audio work anyway?

The SACD camp always talkes about how much 1-bit audio at very high sample rates is better than 24-bit at 192KHz. Is that true?

Ced

wilkes · Apr 13, 2006

Not in my opinion, or that of the AES either.
As to how it works, Google will provide the answers.

This is a copy of a serious research document about DSD....

Digital System Wars

More Evidence on Sony DSD/SACD

In IAR's 1998 Master Guide, we discussed a serious (we think fatal) sonic flaw in the Sony-Philips DSD standard, also proposed as a standard for their Super Audio CD format. That discussion was based on the evidence of one demonstration, a well executed A-B-R comparison conducted by Sony themselves at AES.
Since we published that article, we have had the opportunity to further evaluate DSD and SACD, in two further demonstrations, also conducted by Sony and Philips. All three demonstrations were very different in nature from each other, and on different kinds of systems. Thus, we now have three very different kinds of evaluations in our journalist's pouch as evidence.
Because these three evaluations are each different in nature, they draw an observational bead on DSD's performance from three different angles. It's like triangulating on a target, with three independent and different kinds of observations, taken from different angles. That's very important, since there's always a chance that observations in a single experiment might be faulty, as there might be an unknown peculiar fluke in the one experiment. But if you make independent observations, in three different experiments that are designed differently, then you are essentially looking at the same object from three different viewpoints. If all three independent viewpoints agree, you can be sure that the observed properties truly belong to the observed object itself, and are not merely a fluke of one observation vantage point nor a fluke of one experiment's design.
In this case, all three evaluations of DSD, in three different kinds of experiments, all agreed, and perfectly corroborated each other. They all revealed the same fatal sonic flaw. So the case against DSD and Super Audio CD is now even far stronger than before.
The second demonstration was conducted by Marantz (a high end division of Philips). This demo was based on CDs, rather than master tapes or computer hard discs. Thus its results are assuredly very relevant to what you could expect to hear from Super Audio CD in your home system. This demo was an instantaneous A-B comparison of exactly the same music, recorded onto two different CD formats, and played back from these CDs. The format pitted against Super Audio CD was not the true competition in today's world, the emerging CD standard from DVD-A, which allows 24/96 fidelity. Rather, this demo from Sony-Philips was showing off the alleged superiority of Super Audio CD to merely the ancient 16/44 CD standard. The Super Audio CD was played on a special CD player optimized for this new format, while the 16/44 CD of the same music was played through a standard Marantz CD player. Note that this put the 16/44 version under a bit of a handicap, since (as we all know) there are far better CD players that show 16/44 PCM CDs to better advantage than the Marantz. And, insofar as the SACD playback being optimal, one of Sony-Philips' chief selling points is that the playback circuitry is very simple and can be inexpensively optimized, as it presumably was in the special Marantz SACD player.
So, how did the new SACD format compare to the handicapped and ancient 16/44 CD in this direct A-B comparison?
In some sonic aspects, the SACD lost!! Above 8000 Hz the SACD sounded awful, especially on sibilants of the female singer, and on cymbal sounds from the drum kit. Whenever these musical notes came along, the ancient 16/44 PC CD sounded much cleaner, faster, and more open (remember, both CDs came from the same original master). The SACD exhibited a very trashy distortion on these musical notes, making them frazzled and smeared.
This gross distortion heard from the Super Audio CD version was identical to the sonic flaw we observed during Sony's earlier A-B-R demo using master tapes and studio processors, and occurred on the same types of musical notes. As we discussed in our 1998 Master Guide, this seems to be a slew related distortion, like a digital version of TIM.
This second demo confirmed our findings from the first demo, and it's an especially powerful confirmation because the system setup was so different. Moreover, since this demo employed the finished CD product rather than master tapes and studio processor loops, the findings of this demo are assuredly relevant to what you will hear from Super Audio CDs in your home system.
If the new Super Audio CD loses out even to the ancient 16/44 CD above 8000 Hz, you can well imagine that it will be slaughtered above 8000 Hz by 24/96 PCM CDs, including both the present ad hoc audiophile 24/96 standard on DVD video and the different forthcoming 24/96 DVD audio standard from DVD-A. And indeed we found this to be the case (see below).
In all fairness, we must also report that, below 8000 Hz, DSD and Super Audio CD sounded wonderful in this CD A-B demo, just as we found in Sony's earlier demo. The Super Audio CD sounds more open, airy, musically natural, and dynamic than 16/44 PCM CD below 8000 Hz; in direct comparison, the 16/44 CD sounded more canned, glazed, constricted, and closed in.
As we discussed previously, this means that the basic principles behind Super Audio CD are valid, but that the sampling rate is not nearly high enough to support the higher frequencies of the audio spectrum with decent fidelity. In a 1 bit system like DSD-SACD, a very high sampling rate is required in order to handle music to 20,000 Hz, and to handle steep, high slew rate musical notes such as vocal sibilants and cymbal sounds. The present DSD-SACD sampling rate is only good enough to cover music up to 8000 Hz. This is simply unacceptable as a high fidelity medium. It's like having a speaker system without any tweeter. Actually it's even worse than that, since a speaker system without a tweeter would merely sound dull, and would not actively distort treble information, while DSD-SACD does grossly distort music's trebles.
Many listeners react favorably to the sound of DSD-SACD. They are obviously so entranced by the improved musical naturalness below 8000 Hz that they fail to notice the gross distortion above 8000 Hz on certain musical notes.
The third demo was Sony's current professional road show, for studio engineers. This was a single ended demo, with no A-B comparisons. It's worth reporting on because it showed off DSD to its very best advantage. The playback system included Sony's own very revealing speakers, and the source was as good as it gets, a studio master hard disc. Thus, we were treated to the very best possible sound of DSD, coming directly off the master recorder.
How did this sound? Again, up to 8000 Hz the sound was wonderful: open, airy, natural, and dynamic. But again there were severe sonic flaws above 8000 Hz, especially on musical notes requiring a high slew rate. One revealing track was an a capella chorus. Every sibilant was grossly mangled.
This mangling showed that DSD did a number of things wrong, which are worth a brief analysis. A live vocal sibilant is supposed to sound like clean, open white noise, like a jet of escaping steam. Try saying "ssssss" and listen to the sound. Notice that your teeth are bared, with your lips pulled back. Now say "moon", and then say just the "ooooo" part of "moon". Notice that your lips are cupped way forward, and are cupped into a circle. Next, say "ssssss" again, but this time force your lips into the same forward circular cup as they had while you were saying "ooooo". And finally, continue to say "ssssss" while moving your lips between this forward, cupped position and the pulled back teeth bared position. Notice that the sound of the "ssssss", your vocal sibilant, changes character drastically as you move your lips back and forth between these two positions. In the natural position, with lips pulled way back and teeth bared, your sibilant has a bright, open, white noise sound. This is what a live vocal sibilant sounds like, this is what an accurate recording should sound like, and this is what good PCM digital sounds like (both 16/44 and 24/96). In the artificial position, with your lips cupped forward, the pitch of the same "ssssss" sibilant drops, the sound is duller, the sound no longer has its natural spectral balance (the open, bright white noise sound of steam escaping), and the sound is closed in rather than open (as if it were trapped in a tunnel).
This is what DSD did to the vocal sibilants of the chorus in this master recording. Whenever a vocal sibilant came along, the pitch apparently dropped lower, as if the singers had cupped their lips forward while singing every sibilant.
DSD also mangled these sibilants in other ways. Try saying "ssssss" again (normally, with lips back and teeth bared). Notice that the natural sound consists of lots of little spikes of individuated noises. The only reason that you can hear these noise spikes as individuated, and subtly different from each other, is that there are instants of relative intertransient silence between the spikes. Now try saying "shoosh". Notice that the "sh" sound smears the spikes together into a more homogenous sound, and that there are no longer individual spikes of noise with high peak amplitude.
DSD does this same kind of mangling to sibilants. It reduces the amplitude of the individual peak spikes of noise, and smears the energy over time, filling in what should be intertransient silence between spikes. DSD might have excellent dynamics at lower frequencies, but in the trebles it sonically acts as a dynamic compressor, squashing the peaks. DSD then sonically takes this lost dynamic peak energy and smears it over time, filling in the spaces between transients so that the transient sounds lose their individuality, instead becoming blended and smeared into a homogenous slur. DSD changes "ssiss" into "shoosh".
This mangling of vocal sibilants was striking on the master recording of the a capella chorus, because the recording was so superb at lower frequencies, and because there were no other instruments playing at the same time that might have masked this mangling. We heard this mangling, and another audio pro at this same demo also heard it, being bothered enough by it to speak up about it to others.
Why should DSD-SACD have a too-low sampling rate problem, that leads to these fatal sonic flaws above 8000 Hz? After all, this is a studio mastering and archiving system, which is supposed to have data capability even beyond any consumer distribution medium. And this system is being born in the age of high density laser discs (such as DVD), with ample storage to support high sampling rates.
DSD's too-low sampling rate is even more puzzling, and more shocking, when we look at a bit of audio history. Philips was one of the pioneers of noise shifting, i.e. time averaging of oversampling, a technique which allows fewer bits to do the work of more bits, at least for lower frequencies where there are enough samples to average. In their first application of this technique, Philips reduced the bit resolution only a slight amount, from 16 bits to 14 bits, and they offset this slight resolution loss by oversampling by 4 times, at 176 kHz instead of 44 kHz. This was an equitable tradeoff of information content, with 4 times less resolution traded for 4 times geater bandwidth (although not a perfect tradeoff, since the time averaging failed to offer genuine 16 bit resolution at music's highest frequencies).
Then, some years later, Philips was trying to find a way to build really cheap CD players for budget consumer systems. They came up with a really cheap chip set by reducing the bit resolution from 16 bits all the way down to 1 bit, and they called it Bitstream. With such a large reduction in bit resolution, the oversampling should have been increased to 32,000 times, if they wanted to preserve an equitable tradeoff of information content (to preserve basic information content, the sampling rate should be doubled for every bit dropped from resolution). But Philips didn't do this. Instead, they increased the oversampling to only 256 times the nominal 44 kHz (thus providing 1 bit sampling at 11.3 MHz). Why such a compromise, of only 256 times oversampling instead of 32,000 times oversampling? Remember that this Bitstream system was intended only for the cheapest consumer CD players. It was not intended to even replace Philips' own more expensive multibit consumer CD players. And it was most certainly not intended to become a studio mastering and archiving system. Note that this was over 10 years ago, when the state of the digital art was far more primitive than it is today, and digital media did not have the large storage capability to support the high sampling rates that today's media do.
So, before we go forward, remember and keep this key fact in mind: over 10 years ago, when digital was primitive and storage media limited, Philips designed a compromised 1 bit system for only the cheapest consumer CD players, and they still gave it 256 times oversampling as a sampling rate.
Now let's fast forward to the present. Now we have more sophisticated digital systems, and digital media with much higher storage capability and faster transfer rates, so we can engineer and we can afford higher sampling rates than we could 10 years ago. Now we see Philips and Sony collaborating on a new digital standard which is not intended as just a compromise for the cheapest consumer CD players, but also for the best consumer CD players, and also even for the holiest of holies, studio mastering and archiving of music for generations to come (which obviously merits the very best possible fidelity, without compromise).
Naturally, from all these considerations, one would expect that this new standard would have a much higher sampling rate than the compromise system developed 10 years ago only for the cheapest consumer CD players. One would expect therefore that DSD-SACD (also a 1 bit system)would oversample at some rate much higher than the 256 times of that ancient Bitstream cheap consumer compromise.
So, how much higher, how much better, than 256 times oversampling, is the oversampling that Sony and Philips have put into DSD-SACD, the modern new mastering standard for the ages? Is it perhaps 512 times oversampling, twice as good? Is it 1024 times oversampling, 4 times better?
No.
It's actually 64 times oversampling, which is 4 times worse!!! DSD-SACD, the modern new mastering standard for the ages, samples music at only 1/4 the sampling rate used 10 years ago by Philips' own Bitstream, intended only for the cheapest consumer CD players of those primitive ancient times. Bitstream's 1 bit system sampled at 11.3 MHz, but DSD-SACD samples at only 2.8 MHz.
Remember that Bitstream's 256 times oversampling was already a compromise for cheapness. If Bitstream were to have preserved the same information content as the 16/44 multibit CD player, it would have to have been given an oversampling rate of 32,000 times.
You'd think that any move toward mastering quality, and/or toward modern digital standards and capabilities, would require an oversampling move to a higher number that would at least equal this 32,000 times (which would make it the informational equivalent of 16/44 multibit). But Sony-Philips didn't make DSD better than Bitstream, or equivalent to 16/44 multibit. They didn't even make it equal to Bitstream. Instead, they made it worse than Bitstream. Four times worse! What a travesty!
No wonder DSD-SACD has such problems mangling music's high frequencies! It's a giant step backwards in sampling rate, down to a sampling rate that is simply too low to accurately capture music's fastest waveforms with a 1 bit system
Click to expand...

And also this:

http://www.helsinki.fi/~ssyreeni/texts/bs-over/bs-over.en.html
Copyright © 1996–2002 Sampo Syreeni; Date: 2002–09–17
Oversampling and bitstream methods in audio

Through the relatively short history of digital audio processing, the technology has improved by impressive steps. Nowadays most problems which plagued early digital applications have all but vanished. For most of this we have only two inventions to thank: oversampling and bit reduction. This article gives a short introduction to these important topics. It also presents some of my views on using the resulting bitstream methods in audio transport applications, of which Sony’s Super Audio CD (SACD) is the first and foremost example.
Sampling basics

Oversampling and bitreduction techniques are mostly a matter of implementation—in theory, neither of them are needed to build robust, theoretically sound audio processing applications. On the other hand, both these technologies are based largely on the same principles as are the more classical incarnations of digital audio. Especially one needs at least a cursory understanding of sampling, reconstruction and the management of noise in audio systems.
The sampling theorem

The sampling theorem, in its present form, was formulated in the 20’s to 50’s by the same people that developed the information theory—mostly Harry Nyquist and Claude Shannon, both employees of Bell Laboratories. The sampling theorem is the theoretical basis that allows us to process physical signals on discrete, digital computers in the first place.

What the sampling theorem says is, under certain conditions we can convert a continuous, infinitely accurate (analog) signal into a stream of time equidistant samples and lose no information in the process. The condition is, there is to be no content in the analog signal above or equal to half the frequency we are sampling on—the signal we sample is bandlimited. The conversion consists of taking the instantaneous value of the analog signal at regular intervals, determined by the sampling frequency. Note that nothing is said about the practical method used to achieve such infinitely narrow samples (only the value at a single instant of time affects the resulting number) or the number of bits in a sample (in the theory, a sample is a real number, i.e. it is infinitely accurate). Something is said about how to reconstruct the analog version from the resulting samples—after all, losslessness means being able to return the signal to its original form exactly.

Perfect reconstruction, as it is logically enough called, is achieved by passing the point samples through a perfect lowpass filter. Such a filter cuts off everything above half the sampling frequency. Of course, this kind of idealized response is not physically achievable, just like in the sampling side we have difficulty with obtaining very thin samples. Now, seeing the reconstruction step as a lowpass filter is not very instructive either. But seen in another way, it makes perfect sense: in essence, we are interpolating between the sample points. The ideal lowpass filter responds to one such input sample by emitting a sin(x)/x shaped signal—an oscillating function that dies out relatively slowly as we go farther away from the time of excitation. More specifically, the prototype sin(x)/x function is zero precisely an integral number of sample periods removed from the origin, unity at zero and symmetrical about it. Scaling this prototype by sample value, shifting the response from origin to center it around the exciting sample and then summing the responses we get from individual samples, we have a signal that agrees with the original at sample times and varies smoothly in between.

One might wonder what it means, exactly, to pass the sampled version through a lowpass filter—after all, the digital and analog domains are very different and it is not entirely clear what filtering in each means. And indeed it requires some real math to exhaustively understand that. For those who have a hunch, the sampling process consists of taking an inner product of a signal with a time-shifted delta function (Dirac’s d-distribution) and reconstruction of summing time-shifted, scaled by sample value copies of the impulse response of the reconstruction filter. Which of course is equal to putting out infinitely thin impulses through the filter, with energy proportional to the original point samples. Because impulses (delta-functions) contain energy at all frequencies, the filter then removes all the extraneous stuff above the Nyquist limit. This is why the output filter is called the anti-imaging filter—the extra stuff that is removed consists of frequency shifted copies of the original signal, images.

To process signals digitally, they will need to be bandlimited. If this condition is not fulfilled, aliasing will occur. This means that there will be only frequencies below half sampling rate in the reconstructed signal, and that any content in the input signal with frequencies above half the sample rate will fold into the admissible band. For instance, at 40kHz sampling rate, the extraneous 2kHz in a sine wave of 22kHz will cause it to fold to 18kHz. Aliasing does not sound nice and is to be avoided at all costs. This means that we need to guarantee no inadmissible content is present in the sampled signals. This is achieved by passing the signal through a lowpass filter, the anti-aliasing filter, before sampling. Again the math assumes the filter to be perfect and this is not physically achievable.
Pointlike sampling and S/H

Anti-aliasing and anti-imaging aren’t the only problems that we encounter. Even such a simple operation as taking point samples is surprisingly difficult in practice—the incoming electrical signals roam throughout the sample period. The natural thing to do, then, is not to sample the signal directly, but put a kind of gate circuit in between. These circuits are called sample and hold or S/H. Such a circuit usually operates by sampling the incoming voltage to a capacitor and then switching (through the use of a couple of MOSFET transistors) the capacitor over to a high input impedance amplifier (an operational amplifier with a FET input stage) for sampling. This is better but still it is not optimal—any change in the capacitor’s voltage calls for a change in charge and such a change consumes energy. This energy change has to take place within the brief sampling interval and so circuit resistance, the capacitor’s capacitance and the finite operating voltage of any physical circuit bound from below the time accuracy of the sample and hold function. We also have to worry about charge leakage, circuit linearity and the considerable noise and heat introduced into the circuit by such rapid current flows.

A share of S/H helped alot in the implementation of older A/D converters. However, in the output side we still have problems left: there even correct instantaneous voltages are not enough. The ideal solution requires true impulses which, of course, are not even remotely achievable in the physical reality. If we decide to make do with less, distortions creep in: a S/H circuit in the output of the converter will allow the conversion process to settle to the right voltage without rippling the output but will also produce a staircase waveform instead of a train of scaled impulses. This is, in effect, a time variant linear filtering operation and produces frequency anomalies (the output becomes the ideal one convolved with a sampling period wide pulse which leads to high frequency attenuation—a pulse has a decaying, rippled spectrum instead of the flat unity of an impulse). This is why the operating voltage of converters strictly bounds naïve implementations like the one discussed above. We also get the same energy constraints that we had above, so the output will necessarily become a high impedance one—this is obviously bad from a thermal noise point of view.
Practical anti-aliasing and anti-imaging

Given that the theory is based in perfect lowpass filtering, it seems that imperfect physical filters pose a significant problem. Indeed, since we are talking about conversions, all these filtering operations would seemingly have to be implemented in the analog domain. Next we take a look at some of the problems associated with analog filters.

The first challenge is the amplitude response of our filters. From the theory of linear filters we know that the ideal brickwall filters can only be approximated by physical filters. This goes for both analog and digital implementations. To get a near approximation, the filters will also need to be of high order—in older digital audio systems the order of the analog input and output filters could exceed ten. This automatically leads to noise and stability problems, especially since the best responses require elliptical filters which are known to be quite sensitive to parameter fluctuations. The cost of implementation is quite high and the knee between passband and stopband will always be quite broad. This means that the upper part of the audio band will need to be sacrificed if correct behaviour in hostile conditions is desired.

Even if our filters now have a perfectly acceptable amplitude response, we are not done yet. This is because when high order analog filters are used (and especially elliptical ones), the phase response of the filter becomes exceedingly bad near the cutoff frequency. This means that the filter will ring, i.e. go into damped oscillation near sudden, transient signals. Consequently the time structure of the sound will be somewhat blurred near the cutoff frequency. This is an unavoidable consequence of analog filtering and is usually the reason given for the early CD players’ bad performance. (The bad rep from this era may be why many audiophiles still shun CDs.) Since we still need to limit the incoming audio band, the only real solution would seem to be using a higher sampling frequency so that any phase distortion the filter might cause ended outside the audible band. This is not very nice, though, because wider bands mean wasted space on storage media and also more expensive electronics to implement the system.
Conversion linearity

We’ve seen already that point sampling and anti-imaging/aliasing are easier in theory than in practice. But how about the actual conversion step, the one that takes in voltages and puts out numbers? It should come as no surprise there are problems here, too.

There are at least three major ways to implement the conversion, none of which are perfect. The most straight forward is flash conversion: to convert we generate a reference voltage for each possible conversion value and compare these in parallel to the input voltage. Then we take the highest lower than the input voltage and output the corresponding number. For D/A, we just output the correct reference voltage. This approach doesn’t scale far beyond 12 bits. The second way utilizes the fact that D/A conversion is generally easier than A/D—we approximate the input voltage by setting the highest unknown bit in the output number, compare a D/A’d version of the current number with the original, pick a value for the bit that makes the current number less than the input value and loop for all bits. This is called successive approximation. The method scales well, but is not very fast. We also depend on the accuracy of the D/A step involved. The third way is dual slope conversion—instead of level comparisons with reference voltages, we use the input voltage to drive current to a capacitor through one resistor and then discharge it through another. The time it takes for these two slopes to complete can be measured very accurately and the process is highly dependable. The problem is, it is also extremely slow and so isn’t suitable for audio sampling rates.

Now, from the above we gather that older conversion methods rely on the ability to generate accurate references. The usual way to do this is to use resistor networks and constant current sources. This is also where we get into trouble. Current sources cannot be made infinitely accurate and they always suffer from, e.g., temperature drift. On the other hand, resistor networks rely on accurate values of the resistors (in the best ones, such as the R-2R ladder, on equal values of all the resistors involved) but these are quite difficult to achieve. This means that the reference voltages and D/A conversions achieved through resistor ladders have small variations between the sizes of adjacent conversion steps. Sometimes it may even be that the steps are not even monotonous—a higher digital input value might produce a lower output voltage, for instance. This is very bad since it destroys the linearity of the converter. And when this happens, there will always be distortion. The step size variations, dubbed differential nonlinearity, are difficult to correct without expensive manufacturing techniques. They also lead to converters which perform worse than their width in bits would suggest: an 18-bit converter with some differential nonlinearity might have the S/N ratio of an ideal 15-bit converter. Not to mention that the errors generated can be strongly correlated with the signal and thus easily discernible in the output. All this gives a good reason to try and avoid multiple independent reference voltages and architectures with narrow manufacturing tolerances.
The predominant solution

The mix of problems described above is nowadays solved with a standard bag of tricks that we intend to look into, next. This bag includes oversampling, digital filtering, bitwidth/bandwidth tradeoffs, noise shaping and delta-sigma conversion.
Oversampling and digital filtering

All the problems with the anti-aliasing and reconstruction filters described above are at some level linked to the fact that the filters are analog. In contrast with analog ones, digital filters can have perfectly linear phase response and arbitrarily high order without significant noise problems. Digital filters do not suffer from thermal drift, either. Given that numeric processing is quite cheap nowadays, we would ideally like to perform our filtering operations in the digital domain. But because we are talking about how to convert from digital to analog and vice versa, this would seem to be impossible.

The way to get around this little dilemma is actually very simple: we share the burden between the digital and analog domains. A low order analog filter can be used to guarantee that no significant content is present above some (rather high) frequency. If our sampling process works at a rate of at least twice this higher limit, we are left with a sampling chain with a relatively poor amplitude response in the higher part of the spectrum. The lower portion, however, can be quite usable. We can now use a digital filter to further limit the band to this usable portion without introducing any analog artifacts like phase distortion. We can actually use the digital filter to partially compensate for the imperfect amplitude response of the analog input filter.

Compensation of this kind will bring up noise components in any band that might require amplification or conversely degrade the dynamic range of bands which are attenuated. This is why we might wish that only attenuation is used and start with a higher dynamic range sampling process than the actual bit depth we are aiming at.

Now, given that the original sampling rate was high enough (typically 64+ times the final sampling rate we wish to use) we are left with a high sample rate digital signal with most bandwidth unused. We can now resample this signal to achieve our (much lower) target sampling rate. We have used oversampling to enable digital processing. Since we often downsample by an integral amount (like 64 times), it is even possible to combine the downsampling process with the filtering step, creating a very efficient computational construct called a decimating filter. For digital to analog conversion, a symmetrical structure is used which interpolates to a higher sampling rate.

There are also further benefits to the oversampling process. First of all, the problems associated with sample and hold are diminished. This happens because we are using a much higher sample rate and thus much shorter S/H periods—essentially the filtering operation imposed by staircase formation will now have a response which is almost constant over the audio band. Furthermore, since the analog lowpass filter guarantees that the signal cannot wobble very much during a single sampling period, it may even be possible to dispense with the S/H step altogether. Secondly, the analog filters used can have a very low order. Not only does this mean that the response will be almost constant over the target band but also that the filter will have an excellent phase response. (The problems will be outside the audible band.)

The traditional way to explain the substantial benefits of oversampling is through the realization that regardless of sampling rate, the bit depth of a converter determines the amount of quantization energy inserted. This is because the magnitude of the error signal stays constant while sample rates vary under the assumption that the error is uncorrelated with the signal. If we raise the sample rate, the power of the error signal is spread out in frequency and removing part of the bandwidth with a filter will lower the overall power of the error signal. I find this explanation difficult to understand. The twist is, it seems like this procedure brings us conversion which can be more accurate than the incoming sample stream. What really happens, though, is that the staircasing which results from S/H in traditional converters gets diminished—low amplitude signals are produced with more average accuracy over a sample period because of the higher sample rate and the stairs get rounded by the output filters. This means that even a single bit on-off fluctuation in the PCM data stream gets decoded, correctly, into a sinusoid. In essence, the oversampling makes it possible to reconstruct the signal mainly in the digital domain and the oversampled output with its analog filter is really just a way to deal gracefully with the extra bits produced by the digital anti-imaging filter. All in all, we get a very low noise floor for conversion error, but the inherent resolution of the sampled PCM stream is in no way surpassed. With proper dithering in the A/D chain, however, the increased decoding accuracy makes it a lot easier to hear material which is actually below the noise floor of the quantization plus dithering operations. Oversampling combined with digital filtering opens a way to nearly perfect A/D/A conversion and so it is a very important tool to any builder of audio systems.
Bit depth reduction and noise shaping

After oversampling is employed the filter troubles go away. However, the problems with the conversion step itself are made a couple of orders of magnitude worse. This happens because very accurate conversion is even more difficult to achieve at high sampling rates. In fact, at the megahertz rates required for 64 times oversampling architectures, traditional converters of over 12 bits do not really exist. Nonlinearity isn’t going to get any easier to handle, either. This is why we might wish to do with fewer bits. Digital filtering generates bits, too, so it would be very nice to somehow drop the extraneous ones. But this automatically means quantization noise will get intolerable, right?

In general, yes. But remember that we are talking about an oversampling architecture, now. Here only the lowest 1/64 of the total bandwidth is of real interest. What happens above that is not of real concern since we will not be able to hear it and, more importantly, the analog input/output filters will attenuate those frequencies progressively more in the higher bands. This means as long as we keep the in-band noise in check, we can increase the out-of-band noise level considerably. This is achieved through what is appropriately enough called noise shaping.

The simplest digital noise shaper consists of a quantizer (in the digital domain this just takes a fixed number of the high order bits of a digital word) and a subtraction circuit which subtracts the quantization error introduced into the previous output sample from the current input of the quantizer (in effect subtracting the neglected low order bits of the previous input word from the whole current input word). It is not very easy to see why this circuit does what we described in the previous paragraph. To get a picture of what happens, we must change the configuration described a bit. First of all, we can express the error (which we currently feed back) as a difference between the quantizer input and output values. Then we can separate these two into signals which are subtracted and added, respectively, in the quantizer input. After that it is easy to see that the original configuration corresponds to a very simple IIR filter followed by a quantizer whose output both serves as the final output of the circuit and also feeds back to be subtracted from the input signal. Now, assuming the quantizer is just an additive source of noncorrelated noise (this is a fairly good approximation over a wide range of operating conditions and amounts to linearizing the circuit), it is quite easy to see why the loop behaves the way we expected: the circuit approximates a closed loop linear filter with an embedded source of white noise. The spectrum of the noise is determined by the closed loop response of this filter, and is easily evaluated. In the simple case, the inner IIR filter has a first order lowpass response, so the closed loop response of the outer loop is a highpass one—the noise is inserted primarily in the higher frequencies. Furthermore, the structure of the circuit as a whole guarantees that at low frequencies, the spectral structure of the output will closely match the one of the input with very little quantization noise present.

A comprehensive analysis without separately linearizing the circuit is exceedingly difficult because the circuit is a nonlinear one with feedback—at worst, circuits like this could go into chaotic oscillation. In my mind, this is one of the prime reasons why bitstream techniques should only be used to implement parts of the signal chain—in the absence of a complete theoretical framework, bit depth reduction and noise shaping should not be incorporated into audio systems at an architectural level.

In general, the inner filter described above can be substituted with a much more complex one—this gives us a way to control the spectrum of the quantization noise. This way we can make sure the resulting noise is adequately low over the audible band and, above that, sufficiently attenuated by the filters employed. Furthermore, we may want the noise remaining in the audible part of the spectrum to be well matched to the threshold of hearing so that it will not be heard as readily.

The last part is especially important when there is no possibility of oversampling. Playback of 16-bit sound on an 8-bit soundcard or mixing multiple 16-bit streams for 16-bit playback are two very common applications.

It is clear that the fewer bits there are to worry about, the easier it is to design a working converter for them. So far so good. But the most striking benefit comes when the process is carried out to its logical conclusion to yield one bit processing. A one bit converter, in addition to being extremely simple to implement (one bit D/A is a switch, one bit A/D is a comparator) will suffer from zero differential nonlinearity—there is only one step so all the steps will obviously be of identical magnitude. Some slew rate distortion and a constant offset (resulting from imperfect supply voltages) are practically the only problems of the single bit converter. The slew rate issue can be handled by some careful design and constant offsets rarely matter in a mostly capacitively coupled audio signal processing environment.

Finally, the above description of noise shaping lends itself to both digital and analog implementation and the method is applicable to A/D conversion as well. These two factlets are all we need to arrive at today’s prevailing audio conversion concept, delta-sigma conversion, the topic of the next chapter.

The name delta-sigma conversion comes from the traditional Greek letters used for denoting differences and sums. In the first order noise shaper we introduced first, the inner feedback loop sums (accumulates) successive input values. In the digital domain this is just a running sum, in the analog this is an integrator. In analysis, the value of such an accumulator is commonly denoted by sigma (S), meaning sum. The output of the modulator, on the other hand, is a coarcely quantized difference between successive values of the accumulator—a delta (?) precisely in the sense used when we speak e.g. about delta modulation or delta coding. So what we send are deltas of sigmas, whence the name. This view of the modulation process will also prove useful later on.
Delta-sigma conversion

A beloved child has many names. So does this conversion method: delta-sigma, ?S, sigma-delta, MASH and charge balance conversion are but a few. But the basis is the same—we employ a huge oversampling ratio (usually 64 times the target sampling rate) and aggressive noise shaping to bring the converter down to the one bit regime. In the A/D side we implement the noise shaping circuitry in analog form (the subtraction is an opamp based differential amplifier, the A/D converter is a comparator, the filter is a continuous time or switched capacitor analog one and the feedback loop holds a switch to convert back to the analog domain), in the D/A side we mostly employ digital processing (only the final bitstream is converted).

In addition to the reasons outlined in the previous chapter, delta-sigma conversion has a very persuasive further benefit: it is very cost-effective to implement. This is because the technique does not rely on any precision components (unlike the other methods which require resistor ladders and precision capacitors), is easy to embed into otherwise digital circuits (using CMOS logic and switched capacitor filters, the design nicely straddles the digital/analog boundary) and is repeatable unlike any other (the digital filters are always accurate and the few analog flaws can be ironed out through autocalibration). Further, delta-sigma methods are the only to reach reliable 20+ bit performance at audio sampling rates, a noteworthy fact in an age everybody’s already got 16 bits in CD.

And now on to the downsides. Delta-sigma (bitstream) methods are nice, but they’re not without their problems. We will now delve into those.
Nonlinearity, idle tones and dead bands

Above, when we tried to figure out why the noise shaping circuit did what it was supposed to, we resorted to linearizing the circuit. A hint was given that this wasn’t perhaps the best way to go. And so it isn’t—the linearized circuit does behave nicely and also approximates the actual quantizer performance quite well. But there are occasions when the true nonlinear nature of the circuit crops up. And these circumstances arise in practical converters, as well. The three major problems are idle tones, dead bands and nonlinear unstability, and they tend to plague delta-sigma modulators of higher order. This is unfortunate since the higher the order of the modulator, the higher the potential performance at a given oversampling ratio and converter bit depth.

Idle tones, in system theory dubbed limit cycles, are a mode of nonlinear oscillation. They exemplify the exact opposite of one of the basic properties of linear systems. In the absence of input (i.e. given an input of all zeroes from some point of time), the output of a stable linear system always approaches zero. Of course, the convergence can be slow but it nevertheless happens for all linear systems—it is easy to show that if it doesn’t, the system cannot be stable. But not so for nonlinear ones. Idle tones are one of the consequences. They are stable sequences output by a nonlinear network in the presence of prolonged null input. In delta-sigma architectures they most often occur at very low amplitudes, just when the output of the modulator should go to zero. Needless to say, our ears can easily pick these sequences up, even if they are below the noise floor. Idle tones are heard as faint, whining noises in converter output. The exact time domain behavior of the modulator after entering a limit cycle depends on the structure of its state space, and most importantly the amount of modulator state. As the latter grows with modulator order, it is not a surprise that higher order modulators are the first to be affected.

Intuitively, this is because the more state there is, the more things there can be happening inside the feedback. In the context of delta-sigma modulation, one viable perspective is that as the order of the filter grows, so does the maximum achievable phase shift. This means the extra spectral content inserted by the quantizer has more time to evolve before hitting the nonlinearity again. Hence, longer wavelengths, more action within one round through the modulator and eventually maybe even bifurcation. One further way to understand the oscillatory modes arises from the observation that steep roll-off lowpass filters necessarily ring near transients. The higher the order, the more ringing there is. Combined with the high group delays incurred by high order filtering, this ringing easily leads to oscillation when the output is fed back through the quantizer.

Now, the state is usually rather limited and the nonlinearity in this case is (in some intuitive sense) rather regular. That is why the possible modes of oscillation tend to be at least quasi-periodic and mostly have a short period. When the input signal dominates the circuit, the lineariazed noise source analysis tends to hold so the problems mainly appear at low amplitudes. Summarizing, the resulting tones will have a high pitch, a low amplitude and a definite pitch. Idle tones are considerably more annoying than mere noise or some differential nonlinearity. This is why they must be avoided at all costs. The problem is, controlling this sort of nonlinearity analytically is exceedingly difficult. The most common way to deal with limit cycles, then, is to insert small amounts of noise into the modulator feedback loop to at least disperse the pitches generated and, in the best case, drive the modulator out of the limit cycle into a zero-convergent region of the state space. Notice that this is definitely different from dither, which is used in the input of the converter to decorrelate the total quantization noise from the signal.

Dead bands are a concept closely related to idle tones. They denote parts of a nonlinear system’s state space (other than all zeroes) which captures the system—if the system enters such a band, it will not leave it in the absence of input. The concept of a dead band is more comprehensive than the one of idle tones since it can include nonoscillatory behavior (the output decays to within one unit of zero and stays there) and some gross nonlinear oscillatory modes (like the ones resulting from clipping and, especially, overflow). The concept is really applicable only to circuits which over some fairly broad set of operating conditions closely approximates a linear system, and in such settings is very useful for explaining some of the typical solution strategies which are used to bring the system back on track. For instance, the breaking of idle tones by inserting noise can be thought of as an attempt to nudge the system out of a dead band.

That dead bands can arise at high amplitudes as well is rather troubling. Indeed, at least the analog variants of fourth order and beyond delta-sigma modulators manifest some rather troubling behavior when the values output by the loop lowpass filter (integrator) grow large. This is why the whole theoretical operating range of these designs is rarely utilized and some circuitry is often embedded to detect high amplitude unstability and consequently reset the modulator. An operation such as this seems quite drastic but is rarely needed in a properly amplitude controlled system. At high inputs the effects on the output usually cease in a single target sample period or less, as well. But still, the necessity of such emergency measures does not exactly serve as a corroboration of the theory behind delta-sigma conversion.

One way to understand the range limitation is to consider how the filter is driven in the time domain. Here we have a high order filter which is driven by full scale digital inputs. When we are operating sufficiently near null sigma values, the possible inputs are placed approximately symmetrically around the sigma and the input bitstream will be well balanced with lots of high end and very little anything that would resemble time information. Near maximum sigma the equilibrium is attained when most of the input pulses have the nearer digital full scale as their value and only occasionally there is an opposite, wildly differing pulse of the opposite polarity. Now, for a filter to be of high order, means to react through higher order momenta than just the first one and possibly to ring. Long stretches of constant input will in a way load the momenta and if overshoot then happens, the delayed string of corrective, opposite pulses may perpetuate the effect. Also a very good guess would then be that as the analysis behind noise shaping does not consider phase information and instead relies on a stochastic linearisation of the modulator loop, regular and/or unbalanced time structure in the input of the filter would not be a good thing. Near full scale this is precisely what is produced—inputs with non-zero mean quantization errors in the loop and near deterministic time behavior.
Why go to higher bit depths?

In addition to concerns of economy, conversion and bit depth requirements constitute the major drive behind bitstream methods. It is understandable that people continuously strive for more accuracy. After all, not many people would mistake a recording for the original performance under any realistic conditions. But few people question whether going to ever wider converters and higher sampling rates is really the way. Contrary to common audiophile rhetoric, there is quite a bit of reason to believe we are already quite near the limit beyond which increasing bit rates make no difference to the human observer.

Based on what is known about hearing, people do not truly hear anything beyond 25kHz. And even this is quite a conservative estimate, since it primarily holds for isolated young adolescents. And even if some people do hear frequencies that high, the information extracted from the ultrasonics is very limited—there is some evidence that everything above some 16kHz is sensed purely based on whether it is there, irrespective of the true spectral content. As for dynamic range, research suggests that 22 bit accuracy should cover the softest as well as the loudest of tones over the entire audio bandwidth.

But these limits are not the end of story. If we are simply aiming for a good audio distribution format, some extra processing can yield significant benefit. This is because pure, linear PCM storage in no way employs the peculiarities of human hearing. The dynamic range and lower amplitude limit of our audition varies considerably over the audio bandwidth. Two known methods or employing this variance are noise shaping and pre/de-emphasis. The first uses the above described noise shaping principles to move the quantization noise generated at a given bit depth from sensitive frequency ranges to less sensitive ones, in effect giving more bits to the ranges which most need them.

This is a nice example of a nonoversampling application of noise shaping techniques and is in use right now: at least Sony markets its records as being super bit-mapped.

Noise shaping has the benefit of only being needed in the production side of the signal chain. It shapes the noise floor of the recording and so alters the dynamic range in the different parts of the spectrum. Pre/de-emphasis, on the other hand, relies on the fact that the spectrum of acoustical signals is in general far from flat while at the same time the threshold of hearing also varies over the audible spectrum. The first invariably rolls off at high frequencies and the second creeps quite high in the acute register. Minimum thresholds are attained in the middle register, around the 1kHz mark. This means that it is advantageous to shift the transmitted dynamic range of separate bands with respect to each other. High frequencies, for instance, can be boosted (since acoustical signals leave some headroom there) and then de-emphasized at playback so that any noise inserted by the signal chain is attenuated. In fact, in this range the perceptual noise floor can often be lowered below the threshold of hearing, in effect making the transmission chain perceptually flawless. The problem is, in the crucial mid and high mid ranges the dynamic range of both music and the hearing are wide and we also observe the minimum threshold of hearing. This is why pre/de-emphasis is not a panacea.

Really what we are doing here is reminiscent of noise reduction: we boost the signal in the production side and then attenuate at playback so that any inserted noise comes down similarly. The main differences are that pre/de-emphasis is static (it relies on a certain general spectral shape of the input) and the fact that noise reduction applications aim at bringing the noise below the masking threshold, not the threshold of hearing, and are so much less tolerant to subsequent processing of the decoded signal. Pre/de-emphasis is also a linear operation, so at least in theory it can be inverted perfectly. Of course, this is only possible when the dynamic range of the system was sufficient to transmit the signal in the first place (i.e. there is no clipping).

Some careful thought enables us to construct a system with both emphasis and noise shaping to yield superior perceptual quality from a given bit depth. At best, 4–6 bits worth of perceptual dynamic range can be added by the procedure. Given that the sample rate is high enough so that all frequencies of interest are transmitted (some suggest 26kHz audio band gives all the headroom we need), optimal use of emphasis and noise shaping can theoretically yield perceptually transparent transmission from a 14 bit system. Since we do not rely on masking, the design yields surprisingly good error resilience in the presence of subsequent signal processing. Proper dithering can then be used to make the signal chain linear for all practical purposes. These results are in stark contrast with the claims of audiophiles and the audio industry of the need for ever higher bit rates in audio transmission.

This section draws heavily on the material available at the Acoustic Renaissance for Audio (ARA) web site. The material presented there is also much more comprehensive and doesn’t utilize proof-by-assertion like this introduction. The ARA activity is one of the greater influences to prompt me to take up writing this text.

From the above it appears that with some minor adjustment, the 16 bits and 44.1kHz of CD are almost enough. In practice we do have to be slightly more cautious. It must be acknowledged that some anecdotal evidence exists in favor of greatly increased bit rates. The most important experiments involve the effect of ultrasonics in the presence of frequencies traditionally considered audio band and the effects of ultrasonics on sound localisation and the definition (whatever that means) of sounds. The argument goes, we might not consciously hear isolated ultrasonics but in the presence of other sound material (especially transients) they might serve as additional localisation cues. There is also the age old debate over ultrasonics permeating to the audible band through distortion products generated in the ear. The latter claim has gained some support from experiments involving timbre perception of periodic sounds with and without ultrasonic components. All in all, the effects seem to be minor and it is not entirely clear whether they really exist, at least to a degree which requires attention from the audio designer.

One further, rather persuasive reason to reconsider the need for higher bit rates is the one of sensible resource allocation. If stereo transmission was really the best that could be done, ever higher accuracy could be justified by arguments of the better-be-on-the-safe-side type. But there are multitudes of unaddressed issues in digital audio transmission which have nothing to do with the numerical accuracy of the channels employed. The most important ones are the number of channels, the accurate definition of what exactly is encoded by the information (to date the Ambisonic framework is the only one to comprehensively address this concern) and application of signal processing to enhance the signal chain (e.g. room equalisation, speaker linearization and restoration of analog recordings). The rapid evolution of DSP has also brought out new possibilities, like simulation of acoustical environments, which seem far more interesting from the consumer standpoint than laboratory grade signal chains. We should consider whether the future development of and investments in digital audio systems should perhaps be along these (in my mind extremely interesting) lines instead of on making marginal improvements to channel accuracy.

All of the above holds primarily for audio distribution formats. But when subsequent processing is to be expected, wider samples are very useful in preventing error accumulation. It is well known that most DSP operations, including simple filtering, generate lots of extra bits to existing signals. To guarantee that rounding and dithering products do not accumulate, even 32-bit formats are sometimes used. On the other hand, no such distortion appears in the frequency domain, even in the presence of considerably long signal chains, so this does not affect the sampling rate considerations.
Bitstream as a transmission format

Now we’ve got to the original reason for this article. So far bitstream methods have only been described from the point of view of analog/digital/analog conversion. But a slight change in our point of view, and some study of the complete signal chain from analog to digital and back again leads us to wonder why we are doing the bitstream to linear PCM conversion and its inverse at all. Couldn’t we leave those stages out and simply pass the bitstream resulting from delta-sigma modulation to the playback side? This is quite a natural thought and one that has recently found a concrete application in Sony’s SACD architecture. The subject of this final chapter is bitstream as a channel encoding and an architectural basis of digital audio transmission.
Rationale

The common wisdom in data transmission is that the shorter the signal chain, the more transparent it will be. This is also the most compelling reason for trying to lose the digital filtering and decimation steps from the PCM signal chain. Bitstream advocates feel that since we can do without these steps, they should go. Any processing being a matter of cost, implementations should become cheaper when the digital part is simplified. The idea of passing a simple bitstream is also seen as having a certain elegance. And certainly it has the delightful buzz of any new technology.

On a more serious note, the technique relies on oversampling and noise shaping on a very basic level. The oversampling ratios must be very large in order to get quality playback, so the underlying bandwidth will be huge compared to current PCM systems. Now although the reasoning that lead us to consider delta-sigma conversion and the attendant noise shaping techniques is largely based on putting the shaped quantization noise into the headroom provided by oversampling and then killing it with an analog filter, after removing the digital filters we can also view the system as a full band one with lots of quantization noise, an anti-alias filter with ridiculously bad passband response and funky single-ended de-emphasis to reduce the terrible noise figures. Essentially we have almost a conventional digital transmission line in which only the lowest 1/64 or so of the total bandwidth shows hifi performance. Superficially naïve, this is a powerful observation and has some deep consequences.

We have already seen that a full bandwidth PCM format such as CD can be made a lot better by introducing in-band noise shaping and pre/deemphasis. Now how about applying this reasoning to the above? We get a system in which the lowest 1/64 frequency range (the conventional audio band) is hifi and the rest (possibly up to 32 times the sample rate!) displays a progressively degraded noise figure and maximum output. Now, if we assume that ultrasonic frequencies indeed do contribute to localisation and what not, those frequencies can now, for the most part, be transmitted. The only problem is accuracy, but if we cannot consciously hear the stuff anyway, its presence is a lot more important than the accuracy of transmission. Plus, in the vicinity of the audio band S/N ratios actually stay quite respectable. In effect, the signal chain has a sort of fade-to-noise frequency response.

As the above reasoning suggests, a 64 times oversampling delta-sigma architecture (which is pretty much a standard for 16-18 bit delta-sigma converters in PCM applications) already contains some slack compared to the PCM counterpart. This is to be expected since the data stream is a lot fatter (16 bits times the sample rate vs. 1 bit times 64 times the sample rate already shows a four to one expansion). This implies that there is a certain level of flexibility in the system: varying the roll-off of the analog output filter balances the maximum in-band decoding accuracy vs. the level of access to slightly off-band material possibly encoded. At the same time possible future improvements to delta-sigma modulators give the producer side some choice over greater in-band accuracy (and possibly even enables the encoder to match the dynamic range to the threshold of hearing) vs. encoding off-band material. In effect, the boundary between in-band and off-band material is diminished and the limit can be set in a relatively independent fashion in both ends of the signal chain. All this lends some credibility to bitstream methods as basis for a complete audio architecture.
Sony’s DSD and SACD

The format that has prompted the whole recent bitstream discussion is Sony’s DSD (Direct Stream Digital) which was originally intended for the stereo digital soundtrack of DVD Video. That effort failed so Sony incorporated DSD into new a standalone audio format dubbed SACD (Super Audio CD) which has already hit the streets in Sony’s home market.

DSD is a straight forward application of a multichannel 64 times oversampling delta-sigma conversion at some two and a half megahertz followed by direct transmission/storage of the resulting bitstreams and low order analog lowpass filtering for reconstruction at the reproduction side. SACD places this bitstream on a DVD derived high capacity disc. The most important technological contribution of SACD is the introduction of optional double-layered and hybrid discs. Like other members of the DVD family, double layers simply mean double the capacity. This could be used for extra playing time or, at a later date, to accommodate multichannel capacity (currently only stereo SACDs are defined). The hybrid disc is a more interesting concept.

Hybrid SACDs incorporate one high density layer which stores the DSD bitstream and in addition to this, a Red Book compatible CD layer. The promise goes, hybrid discs will play as CDs in normal CD players in addition to containing the higher fidelity DSD stream. The SACD standard also defines that every SACD player must support the CD format. This is easily achieved because the technology is a direct DVD derivative—DVD players commonly employ dual beam pickups and two layer discs and so have the necessary dual focus capability. The only real obstacle in the way to complete Red Book compatibility is ensuring that the resulting hybrid discs fulfill the Red Book requirements for disc refractive index, thickness, depth of the recording layer and the absorption coefficient of the disc material at the 780nm wavelength used to read a CD. Through some design, Sony has accomplished this goal and created the migration path essential to any new audio format. From the user’s point of view, SACDs currently behave just like conventional CDs. In the future multichannel playback is envisioned and the standard can accommodate up to 6 channels of DSD encoded audio data.

In addition to the above specification, Sony has tried to make the SACD platform more desirable to content providers by embedding both a visible and an invisible watermark into the disc without which the SACD player will refuse to play the disc. This is done to make piracy more difficult. As a further hindrance to copying, no digital outputs are provided in the first generation SACD players.

Sony has tried to position SACD as an audiophile format and holds that SACD is not a direct competitor to DVD-A which it claims is more geared towards the ordinary consumer (read: is somehow in the low end). This is very much reflected in Sony marketing rhetoric surrounding SACD, which invokes the audiophile fondness for analog formats and capitalizes on the benefits of a simplified signal chain.
Foreseeable problems

And now on to the meat. I do not agree at all with Sony hype about SACD being what practically amounts to the Second Coming. I also believe I share this worry with the right people—I’m certainly not the first one to think SACD is not a healthy way to go.

Perhaps the most straight forward reason why SACD is a bad idea is that it is perhaps not needed at all. Blind listening tests tell the average consumer has a fair bit of difficulty telling 24 bits at 96kHz from properly implemented 16 bits at 44.1kHz. Considering the numerical differences between these formats the question of whether we really need accuracy beyond the level of CDs becomes quite acute. Quite some people with golden ears agree that the difference is subtle. Now, the effective bit depth of DSD is around 20 and 24/96 already has over an octave of ultrasonic bandwidth. Why is it that by and far, the same golden ears find a great difference between CDs and SACDs?

Since SACD is very clearly a distribution format, we can question how close CDs mastered for delivery only (i.e. utilizing aggressive noise shaping, perhaps even driven by a masking model) can come. And theory suggests they come really close. So it might be DSD isn’t the optimal approach to improving the signal chain after all—perhaps we should instead stretch CD a bit. As for ultrasonics (which are possibly the only thing CDs cannot address at the fixed 44.1kHz sample rate), the evidence is not conclusive. Perhaps some content above the CD limit of 22050Hz should be included, but the limits set by DSD seem excessive.

Some proponents of DSD also claim that DSD offers superior time accuracy because of the bitstream approach and the extremely high sampling rate. The argument goes, we get in between the samples of PCM because the bitstream changes more often. But it is well known the reconstruction step in PCM achieves similar between the samples resolution, although simple minded analysis doesn’t show that right off. Dither also makes the phase resolution of PCM essentially unlimited when we integrate over all of time.

In addition to purely acoustical arguments, there is a host of technology based reasons to reconsider utilizing SACD. The most serious are related to the fact that bitstreams diverge radically from the more traditional PCM representation. Essentially, a bitstream has no number-like structure, no clearly delimited frames such as the ones defined by PCM samples and the information (contrary to some Sony claims) is not even concentrated in the width of clearly defined pulses (that is to say like in traditional, analog pulse width modulation) but is distributed in a very complicated manner over long bursts of successive bits by the nonlinear noise shaping procedure. Essentially, this puts all current audio processing algorithms in the trash can—these methods require discrete signals which approximate sequences of real numbers. DSD streams do nothing of the sort since every bit in the sequence is inherently bilevel. You cannot even sum DSD streams without running into serious trouble, not to mention the complications with multiplication. And when multiplication and summation go, so does all of today’s signal processing theory.

Now, should we want to compress, convert, mix or edit the stream we have only two possibilities. The first one is, we convert to PCM. The second is, we build new DSP theory to do the operations in the bitstream domain. The first one immediately goes out the window since the first premise of DSD is that the signal chain should not include any of them harmful filters. We also run into complications with delta-sigma itself—it is difficult to guarantee such conversions will be linear. This is less of a problem when we only do the step once or twice and the conversions are viewed as being approximative. But when DSD/PCM/DSD conversions need to be performed multiple times, we run into problems. The format isn’t even specified strictly enough to allow for optimal converters to be built—after all, the best converters marry the digital filters to the ones used in the delta-sigma modulator. In DSD the room left for scalability means the specifications aren’t exact and the architecture inherently separates the modulator and conversion filters from each other.

The second option (new theory) is not very attractive either because it implies creating a theory of nonlinear audio processing from scratch. The complicated time structure of bitstreams—or better yet lack thereoff—complicates any attempts at direct processing even further. Pulling all this together, DSD is not compatible with anything involving calculation. This means it is not suitable for editing and, subsequently, mixing or post-production of any kind.

Considering delta-sigma modulation from the quantized differences of a running sum viewpoint, we see the inaccuracy in the delta values and the variable architecture of the loop low pass filter imply that the actual output of the modulator is highly approximative. Intuitively it would seem that any exact mathematical analysis of the output or complete processing framework for the resultant bitstreams must rely on the precise time behavior of the loop filter. We might think that since the audio band is present in the bitstream in its intact form, we might somehow neglect the precise behavior of the bitstream and only work in the significant frequencies. But this will not work either because additive processing and filtering cannot be accomplished directly.

Like any distribution format, DSD faces the usual questions of error resilience, space efficiency and so on. DSD does not fare very well in this department. Error correction codes can be used like usual but error concealment of the kind employed by CD players will at least at the present necessitate PCM processing and so DSD to PCM conversion. Of course, this does not affect DSD processing under normal conditions but requires the conversion to be implemented if error concealment is wished for. Space efficiency is significantly worse than for PCM since a lot of headroom is needed to cater for the distortion and the relatively broad, mostly unused spectral slot between the audio band and the stop band of the (slow roll-off) analog output filter. Another way to see the genesis of the overhead is to consider how traditional delta-modulation fails and then extrapolate to delta-sigma: we need very high sample rates before a single bit increment can succesfully approximate a continuous signal. This holds for both delta and delta-sigma modulation methods, although the precise time behavior of the loop filter may give a certain edge to delta-sigma modulation.

That the same information takes more space when encoded as a bitstream than when a PCM presentation is used is caused by the same reason that Arabic numerals are so much handier than their Roman counterparts. Bitstreams exploit the ordering of the bits much less aggressively than PCM representations. Delta-sigma modulation produces bitstreams in which the average density of bits carries the information, not the precise sequence formed by them. The higher the order of the modulator, the closer to the audio band the resulting noise becomes, the steeper the output filter needed (this means complex time domain behavior and ringing) and, finally, the more the actual bit to bit time behavior of the bitstream matters. This is precisely why higher order modulation can give more accuracy while the amount of bits produced stays constant: higher order modulators produce a wider band of modulation noise and rely more on the actual order of the bits transmitted to drive a steeper, ringing output filter along the desired output waveform.

A practical implementation of delta-sigma conversion adds one further ingredient to the overhead issue: limitations on modulator sigma values. It was mentioned above that to guarantee stability, we might not be able to drive the output of the loop filter near its maximum value. In fact, many high order modulators employed in PCM applications use only some 75% of the theoretical maximum range of the circuit. Of course, a well designed PCM application will be calibrated to produce full scale output at the new limit. But when the bitstream is passed through on an as-is basis, there’s not much to be done. A part of the theoretical operating range will be left unused. We then either require more transmission bandwidth to achieve the theoretical maximum accuracy of the original system or the precision of our implementation will be below what a straight forward noise shaping analysis would suggest.

The problems caused by high amounts of overhead are exacerbated by the difficulties in processing bitstream data. We cannot use compression (lossless or otherwise) to reduce the overhead, especially since the bit rates are very high, no clean symbol structure is present and the high frequency noise inserted by the modulation step renders any reasonable dictionary compression strategies moot. These concerns primarily affect multichannel and multimedia applications because in these, space is usually at a premium. Given that this is the direction most audio applications are seemingly going, the difficulty of compression can be a significant factor. If we employ compression (like DVD Audio does with its packing algorithm) and also are able to use the existing PCM processing techniques, much greater flexibility can be given to the content producer. Unlike SACD, DVD Audio lets the producer make the most suitable tradeoff between bit width, sampling rate and number of channels instead of mandating any fixed combination.

Going more to the theoretical side of things, while SACD is based on a framework of its own, it does rely on some unwritten rules of past audio systems. One of the more important one is the rule that each stored channel is destined for a single speaker, and that this faith is all that is needed to define what the stored signal means. But this is an assumption whose validity has more than once been questioned. First of all, to get optimal playback, the stored material must be adapted to the particular configuration of playback equipment the listener happens to have. This may mean anything from putting one channel out louder than the others to head position based HRTF filtering for high tech headphone playback. Processing like this also raises the question of what the input data actually is. The traditional channel concept is certainly not enough if HRTF processing or some similarly delicate procedure is to be performed. To date the only framework to convincingly address these questions is the Ambisonics one, where the channels are defined in terms of a spherical harmonical decomposition of the soundfield. The great benefit of such an approach is that the storage format is abstract, neatly specified, easily processed and completely gear independent. But this doesn’t work when the format does not support extensive signal processing capability. DSD does not. Gear dependence also future proofs a format, allowing new converter technology and processing methods to be employed. Again, DSD with its fixed specification of sampling rates, no compression and a very fixed pass-it-thru-a-filter configuration does not allow anything of the sort.

In spite of the preceding, Sony claims DSD does have scalability. This scalability mostly comes in the form of varying output filters. But this has its downsides as well. When the production side delta-sigma modulator and the analog output filter are varied separately, there is a very definite danger of letting something inappropriate through. This is a very real concern since there are no steep digital lowpass filters in the signal chain as there are in the traditional PCM one. For instance, the principles of delta-sigma modulation dictate that to raise the effective bit width of the conversion, we need to raise the sampling rate or the order of the modulator. The first cannot be done once there is an installed base of SACD players, but the second can. There is also a very real reason to—DVD Audio scales to 24 bits but the current, effective width of DSD is approximated to be 20 bits. If such a move is made, the noise generated by the modulation process will suddenly get a lot closer to the useful audio band and filters made today will not be prepared. This would suddenly mean a lot of ultrasonic energy in the analog output. In case, this would lead to problems with interference and nonlinear distortion (i.e. ultrasonics dropping down to audible range through such intermediate processes), requantization errors (e.g. for MD recording), breach of electromagnetic interference guidelines, increased jitter sensitivity, unexpected electrical resonances and certain problems with duty cycle modulated amplifiers (the so called digital ones).

Any migration from CD to SACD (or DVD-A) will also be a question of resource allocation. Currently most people would happily admit that the lack of sound field immersion, three dimensional sound and localisation/directionality is far worse a problem for audio systems than their bit accuracy. Even audiophiles are likely to be more worried about other parts of the signal chain, like speakers and the room response, all problems which can at least partially be solved through DSP techniques. It is then highly questionable whether money should be spent on accuracy before the other concerns have been addressed. We might also ask if not PCM techniques would perform better at DSD-like bitrates and so be a better candidate for any new audio architecture—after all, they cut the inherent overhead of bitstream modulation.

It is a truism of modern audio production that extensive processing is needed. Far from being about to die out, there is a great push to utilize signal processing techniques in the both ends of the audio signal chain. Studios use them for synthesis, effects and post-production needs, radio stations need compression and equalisation and the end user probably wants bass boost and, one day, correction of the room and speaker responses and possibly simulated acoustical environments. It is therefore very doubtful whether investments should be made on an audio system which at the outset makes any such operations difficult. Often SACD advocates suggest precision analog processing as an alternative. This is downright dubious since the prime reason digital systems are used in the first place is because of their generally higher performance, better error tolerance, cost-effectiveness and the ease of processing the information.

Perhaps the funniest example of these new user needs is the inclusion of a variable coefficient equalizer into both of Sony’s SACD players (called VC24). The marketing speak clearly acknowledges the need for equalisation, both for corrective purposes and for listener preference, yet the circuit will either have to be switched off for DSD playback (bitstreams do not work with digital filters) or a DSD-PCM-DSD chain will have to be present (in direct contradiction with the DSD philosophy).

On the content producer side, SACD will not be very easy to master because the traditional studio audio processing model will have to be rethought. If processing in the bitstream domain is wished, new equipment will have to be purchased. Existing sound transports, like DAT and AES/EBU, are not compatible with the DSD signal chain. It is very likely that the equipment needed to work entirely in DSD will not be cheap enough to be available in but the largest of studios. Perhaps not even outside Sony itself. Similarly the heavy intellectual property protection, the narrow circulation of the SACD specification, and the probable incompatibility with writable DVD technology (especially the probable lack of DSD specific editing and mastering software with burn capability) will with all probability make the format quite hostile to smaller studios and the home studio based musicianship. This is in stark contrast with DVD Audio as the latter format can reasonably be expected to be computer writable in the near future. It might also be the format will end up being less than friendly to music of nonacoustic origin. After all, SACD is aimed at the audiophile market. I think this particular audience does not constitute the necessary force to drive the development of the software and hardware solutions necessary to author electronic music in DSD. The difficulty incurred by the DSD format on digital processing also precludes the authoring of electronic material entirely in DSD, necessitating PCM processing. It then becomes highly questionable why DSD should be utilized at all. In fact, using a PCM media like DVD Audio will actually shorten the signal chain for electronic material.

Technical details aside, the marketing side of SACD isn’t all that bright either. The first, and possibly most disturbing, facet is that the SACD seems to be strictly a distribution format. Sony has gone to great lengths to ensure the format cannot be copied and is as incompatible with existing equipment as possible. The format isn’t editable. The format is rather hostile to smaller studios as well, so it is clear that it is meant to be limited to the traditional record company centric model of the music business. Indeed, SACD could really be seen as a part of Sony’s overall strategy to secure its position against the changing conditions in the record market. SACD is difficult to convert to MP3’s, is well copy protected and is a direct competitor to less secure audio formats like CD and DVD.

Sony, as a member of the DVD Forum, does not acknowledge that SACD is a competitor to DVD Audio. However, the fact that SACD marketing goes against PCM techniques in general (not just CD) and the highly aggressive time-to-market schedule of the first SACD players (even sacrificing multichannel capability, which to most consumers is more important than DSD) suggest the contrary. All this is very understandable, of course, since DVD Audio’s stronger multimedia capability, more flexible authoring platform and the drive of DVD-V and DVD-ROM technology would most likely have obsoleted SACD before it even made it to the stores. Now SACD has an edge over DVD Audio in the Japanese marketplace.

In the motives side, Sony of course has a definite financial interest in SACD. And even more so when we consider that unlike with DVD, Sony receives the substantial part of licencing fees for SACD. So SACD is far more clearly seen to be proprietary than CD or DVD derivatives can ever be. Sony licencing policy is quite worrisome, then. So far it seem it has worked (there is substantial industry backing for the SACD format), but there’s never a guarantee with a proprietary format.

As for SACD and market acceptance, there is a definite problem. For the ordinary consumer, DVD Audio fulfills the promises of SACD and additionally delivers multimedia and multichannel capability. This means that the SACD installed base will not grow very fast and will probably be dwarfed by DVD Audio. This will mean expensive players and poor media availability (as of now, only a small number of Sony titles is available). SACD does not have the impetus of DVD-V and DVD-ROM/RAM behind it to drive the prices down. This is a concern primarily with regard to the media—hybrid discs are important in the CD to SACD migration phase but are also a SACD specialty with little outside application to drive production costs down. (SACD and DVD will largely share drive components.) A further source of extra cost is the PCM incompatibility which implies duplication of production and mastering resources, thus further limiting who will be available to supply music on SACD. Unlike other companies, Sony has to worry about MiniDisc, CD, DVD Audio in addition to SACD, so the cost of releasing all material in all formats may prove excessive. This means even Sony might have trouble supplying material in its new format.

With all probability, industry support for DVD Audio will be wider than for SACD. This is because DVD Video players will systematically go with DVD Audio. Some notable manufacturers like Matsushita intend to release players which can do both DVD Audio and SACD, but it is not entirely clear if the support will be comprehensive. It may well be that multichannel capability does not appear in time while for DVD Audio, it will probably be supported from the start. In my mind, this is a crucial question for SACD. If multichannel SACD support from multiple manufacturers does not appear soon, the format will suffer serious harm.
Hype control—counters to SACD marketing

In the spirit of fairness and SACD bashing, some of the more ludicrous marketing speech of Sony deserves a counter. In the following some of the claims made by SACD material are put into proper perspective (and hopefully debunked altogether).

The favorite demonstration used to illustrate the extended frequency range of SACD is to display what happens to a 10kHz square wave when it is recorded on CD and SACD and then played back. The illustration consists of four oscilloscope shots and displays how SACD produces a very close approximation to the original square wave while the corresponding result for CD is a considerably rounded waveform which is closer to a sine wave than a square one. The pictures are very convincing and will probably spook quite a number of CD owners. They are accompanied by a brief description which tells how CD loses harmonics of the test wave from the third up and so is clearly inferior to SACD. What is forgotten is that the second harmonic of a 10kHz periodic waveform (which CD can handle) is at 20kHz, already at the upper limit of hearing for adolescents. The third harmonic would be at 30kHz and there is little evidence that people are able to hear that high under any reasonable conditions—it’s ultrasound. So is the demonstration meant for you or for your dog?

Similarly deceptive an illustration displays a scope shot with a cycle of something resembling a sine wave and an approximate DSD bitstream below it. It is easy to see the mean density of the bitstream closely corresponds to the value of the sound wave at each point in time. The text claims that since the stored bitstream is so close to the original wave, the resulting playback quality is superior to the one offered by PCM techniques. But what this really aims at is convincing those people that have reservations toward digital audio media and prefer good ol’ analog. The fact is, the stored structure of the data doesn’t matter a single bit as long as the output voltages closely follow what went in. After all, what is stored on a SACD displays little resemblance to the pure DSD stream the data carries. What matters is the subsequent processing and the soundness of theory behind it, as always.

Illustrations of typical PCM and DSD signal chains also serve a role in the campaign. Since DSD is obtained by dropping the digital filtering parts from a PCM signal chain and passing the pure bitstream directly to the receiver, it is easy to claim that the resulting signal chain now lacks some distortion. To some degree, this claim might even be true. What is forgotten, though, are all the practical consequences of the new architecture and the fact that in a well engineered signal chain, the filtering operations can work with significantly more flexibility and accuracy on the side of the bitstream, that the filters operate at a level of precision far beyond that imposed by the PCM format chosen, that the PCM philosophy does not call for exact reconstruction of the same bitstream in the D/A conversion process that was used for A/D, but rather the reconstitution of the signal represented by the PCM signal and that any rounding operations can utilize the precise same noise shaping logic that leads to the supposedly superior performance of DSD. Furthermore, proper dithering also aids in making the quantization procedure (including what the filters do) perceptually transparent. Furthermore, it is entirely forgotten that using DSD not only cuts the stages in the signal chain, but also prevents anything from being inserted there when really needed. This includes studio apparatus, which for the time being will be used by operating variably in DSD, analog and PCM domains—clearly far worse a thing for the quality of the resulting DSD stream than once going through a high precision digital filter and after that staying entirely in the digital (PCM) domain.

Some DSD groundwork material also suggests that the architecture is made cheaper by the elimination of some digital processing steps. This is downright funny considering the price of the first SACD implementations and the intended target audience of audiophiles, some of which are ready to pay tens of dollars per meter for speaker cable. Similarly market acceptance issues and the time it takes to update production and playback facilities and the money that goes to prolonged multiformat support will mean higher prices for the media as well as the players. Furthermore, fixed function digital signal processing is very cheap nowadays—most of the price of a CD player comes from sources other than the D/A converters which are quite economical to manufacture in large quantities. Remember, cost effectiveness was one of the reasons for going into delta-sigma converters in the first place.

Copyright © 1996–2002 Sampo Syreeni; Date: 2002–09–17

And

Bitstream versus PCM debate for high-density compact disc

Prof. M.O.J. Hawksford

University of Essex, Centre for Audio Research and Engineering

Introduction

The Acoustic Renaissance for Audio (ARA) have proposed [1] a multi-channel, high-resolution audio encoding format for use with the next generation of compact disc with further introductory discussion openly published [2, 3, 4]. The ARA proposal document has already been widely circulated to the audio industry and the following text assumes familiarity with the ARA proposal.

This report is prepared in response to a proposal to import bitstream code directly onto high-density optical discs. Although offering certain philosophical and economic merits we believe that there are fundamental flaws and significant system limitations in using bitstream technology for audio data storage. Specifically, bitstream fails to address the future technical aspirations required by the audio industry where advanced digital processing will be used to improve accuracy in electrical-to-pressure transduction and also three-dimensional sound reproduction. We therefore present a discussion of the reasons for preferring a system based upon PCM rather than bitstream coding.
A review of linear PCM principals and characteristics

Implicit to the ARA proposal is the use of uniform sampling and uniform amplitude quantization with dither, where specifications up to 24 bit at a sampling rate of 96 kHz are supported, a process designated linear PCM. It is a fundamental premise of our proposal that there will be no form of lossy perceptual coding employed and that each channel of the system will be bit-transparent from input to output. The only legitimate concession to psychoacoustics is made in the limitation in bandwidth and in dynamic range together with the option for using psychoacoustically motivated noise shaping to enhance the subjective resolution of low level noise, a process considered by the ARA committee to be a completely linear process.

We maintain that correctly implemented linear PCM implies only distortion attributable to band limitation and non-correlated random noise [5]. In making this statement it is understood that a uniform quantizer with optimal dither is a completely linear (but noisy) process and that the use of an error feedback loop that encapsulates the pre-quantizer dither sequence to achieve spectral shaping of the noise is also a completely linear process. In such systems there is no correlation between signal and spectrally shaped noise, and as such fully meets the aspirations of the audiophile community.

The ARA proposal strongly supports the use of lossless or transparent data compression a process also termed lossless packing. Such techniques employ a predictive algorithm to encode efficiently the PCM data stream and to offer bit transparency across encoder and decoder. Algorithms exist which can track both short-term average and peak bit demands and can offer efficiencies in the order of a 2:1 data saving. It is also a characteristic of data compression that there is reduced correlation between bit patterns and audio data which should facilitate reduced levels of correlated jitter [6,7], which is a critical factor in high-resolution digital audio systems.

The ARA is in favour of extended bandwidth in digital audio and supports the work by Pioneer through their experiments in 96 kHz sampling. It can be shown that when such over-sampling is employed, transparent data compression gains in efficiency, where we estimate that the actual data rate can increase by as little as 1.3 compared with uncompressed audio data sampled at 48 kHz. This extension in audio bandwidth mirrors one of the principal advantages of bitstream coding. However, by reducing the audio band below 48 kHz there can be gains in compression efficiency; this does not occur with bitstream.

To summarise, a linear PCM system encapsulates the following processes:

* Uniform sampling
* Uniform quantization
* Optimal dither
* Non-dynamic psychoacoustic noise shaping
* Transparent data compression

In drawing comparisons with alternative coding strategies they should be benchmarked against the above attributes, which when correctly implemented result in a linear communications channel.
The case against bitstream for code on optical discs
Bit rates

Bitstream coders require high over-sampling ratios in order to achieve an acceptable performance with a one-bit code. Conservative estimates suggest a minimum of 64 * Nyquist with a 5th-order architecture in the encoder, that is 64 * 48000 = 3.072 Mbit/s. For a high-resolution system it should compare with at least 20-bit PCM, in practice this implies a sampling rate significantly above 64 * Nyquist. A single high-resolution PCM channel with a 96 kHz sampling rate and using transparent data compression requires a bit rate of approximately 20 * 48000 * 1.3 = 1.25 Mbit/s. The PCM code is substantially more data efficient and also offers linearity together with a bandwidth extension normally considered to be a principal attribute of bitstream.
Transparent data compression

There is no opportunity to use transparent data compression with bitstream codes as is the case with PCM when conveying typical music signals. This contributes directly to code inefficiency, and the ability to reduce correlation between digital data and audio information is lost [6]. Indeed, it is a primary attribute of bitstream that there is high correlation between bit pattern and audio data as signal recovery is implemented by processing the bitstream directly with a low-pass filter.
Storage inefficiency in a 3-D environment

The discussion in 3.1, 3.2 indicate that bitstream is not an efficient code and therefore is extremely wasteful of disc storage. Although bitstream may be appropriate for a simple two-channel system employed with a high-capacity disc, the capacity limitation becomes unacceptable when the needs for multi-channel are included. The ARA document [1] should be consulted at this juncture to gain familiarity with the comprehensive multi-channel + 2-channel format that is proposed for the new high-density optical disc. This is especially relevant when the needs for compatibility with DVD is considered. We believe a disc that offers no DVD compatible attributes, and that does not support three-dimensional sound reproduction will have a limited and short-lived appeal in the next millennium.
Linearity of bitstream encoders

There is a fundamental problem in guaranteeing exact linearity using bitstream coders based upon delta-sigma modulation, although we recognise the advances currently being made by chaotic architectures and the inclusion of dither. The problem arises because a 2-level quantizer, even with dither, cannot be considered to be linear, unlike the multi-level quantizer with dither. Consequently, linearization must depend on the use of negative feedback (i.e. noise-shaping feedback) to achieve an acceptable performance. Even then performance cannot be guaranteed for all signals, where at low level, correlated distortions (idle-channel sequences) can exist although they may be below the system noise level. At higher signal levels, there can be signal-dependent stability constraints which is a particular problem in higher-order coders. In such schemes, because of correlation, it is difficult to completely eliminate modulation noise. We accept that excellent performance is achieved by bitstream techniques, although in practical systems as we shall discuss, there would be the need for multiple cascading of bitstream converters with the potential for a build-up in distortion compared with PCM.
Code table bitstream architectures with minimal correlated distortion

An alternative bitstream converter has been reported [8] that uses a combination of a linear quantizer with dither to guarantee linearity, together with 4th-order noise shaping. Conversion to a 1-bit code (typically from a 4-bit code) is then performed by an open-loop, optimal code conversion table which minimises spectral modulation. Potentially the system produces high resolution with no low-level correlated distortion. However, the bit rate is again very much greater than PCM, and although solving the problems of correlated and idle-channel distortion it is too bit inefficient.
Signal processing in the recording studio

Although there is a certain elegance to the bitstream approach which is attractive for a simple recording chain of a back-to-back ADC and DAC, this elegance is lost when the needs of the recording studio are considered, even when this is a relatively 'direct' audiophile process. Bitstream signals do not match the needs of signal processing operations such as addition, gain change and convolution. At the heart of all these mathematical operations is the need to convert signals at some point into a multi-bit format. The multi-bit signals then have to be re-translated to a bitstream code using a further noise shaper configured around a 2-level comparator [9]. It is highly probable that such cascades of essentially non-linear processing will have audible consequences that will not withstand the scrutiny of the audiophile fraternity. It is, therefore, inconceivable that the recording industry would re-equip with processors and signal distribution systems that employ a bitstream format.

A PCM based system is more efficient for signal processing, high word lengths can be maintained (24 bit) and where re-quantization is employed, optimal dither and noise shaping can be used to guarantee linearity, a fundamental requirement of a high-resolution system. The lower bit rates inherent with PCM are also welcome for efficient signal distribution, as is the greater ease of multiplexing and frame synchronisation.

As the majority of recordings require some degree of signal processing, the simplicity of the bitstream approach is lost, as effectively the bitstream code would be computed at the output of a multi-bit recording complex.
Signal processing in the playback system

The modern approach to sound reproduction is to use digital signal processing to enhance the performance and obtain greater accuracy in the reproduction chain. Processes such as digital equalization of linear loudspeaker errors have been reported [10] and the use of DSP in the implementation of loudspeaker crossover networks has already reached the marketplace, as pioneered by Meridian Audio UK in their range of audiophile and home-cinema products. To implement such systems based upon bitstream code is difficult and inefficient and is again at variance with the simple philosophy of bitstream.

It would appear that the bitstream approach only caters for an analogue world, where the cost of the player can be minimised as the need for over-sampling filters is removed and the actual DAC is very basic. This incompatibility is seen as a severe limitation where the bitstream audio data would be neither compatible with advanced digital replay equipment or with the broader application to three-dimensional sound reproduction which will also require digital processing to optimise performance. Such an approach does not match the ethos of an advanced high-resolution audio system that should be designed to match the future needs of the industry.

There is also the question of compatibility with other systems such as DAB, LaserDisc, satellite, all of which would require format conversion to/from PCM. Possibly the most significant of all is the incompatibility with computer systems, which are set to dominate the entertainment market where the digital accessing of audio data off high-density optical discs will be required.
Jitter sensitivity

In investigating the requirements of a high-resolution audio system, it is evident that jitter performance is paramount - an area already given wide discussion in the technical literature. A bitstream code that simultaneously contains high-amplitude and high-frequency noise is susceptible to jitter - where intermodulation with timing jitter can fold signal energy into the audio band and thus compromise performance. In this area, it is believed that bitstream is inherently more jitter-susceptible than multi-bit systems.
Multi-bit converters and interface technology

The use of bitstream suggests it would not be appropriate to use multi-bit ADCs and DACs as these would require signal format conversion. Consequently we believe the Sony proposal supports only a bitstream world of converters where although excellent results are achievable, this neglects the performance advantage offered by modern multi-bit converters that now find favour in many advanced audiophile products.

In addition, external digital interfacing in a bitstream system - especially where say 6 channels of data are needed - requires new interfaces that operate at substantially higher sampling rates. It is evident that in the area of interfacing, the more code-efficient PCM format offers a clear advantage.
Conclusion

This document has reviewed some of the salient features that we believe should be considered when deciding whether to use a bitstream or a PCM digital format. We believe the inherent advantages of a linear PCM system to be overwhelming both in the guaranteed performance parameters and in the convenience by which signal processing can be performed.

We recognise the advantage of bitstream in a basic system and the natural extension of ultrasonic bandwidth, however this is easily lost in post-processing where there is a danger of intermodulation with high-frequency audio and out-of-band shaped noise. However, the use of transparent data compression enables an efficient extension of the audible bandwidth, which moves the argument back in favour of PCM. We of course accept that bitstream converters have an important role in PCM-based digital audio.

Fundamentally, the next generation of audio disc should embody what we call the 'third paradigm' of audio, namely three-dimensional sound. We must also ensure a performance envelope that ideally extends beyond what is theoretically necessary or what today's technology can achieve. An advanced system should handle multi-channel information and set performance goals to which designers can aspire. It may be several years before the full potential of the system is realised in terms of both sound quality and the use of three-dimensional sound.

We believe the ARA proposal sets the guidelines for a clear evolutionary future, that gives compatibility with Red Book CD and DVD, with high-resolution 2-channel audio and finally high-resolution three-dimensional sound encoded into a linear format uncompromised by present-day assumptions of perception.

PCM with transparent data compression offers an uncompromised efficient code that is fully compatible with advanced recording and replay processes.

We consider that these advantages of PCM far outweigh the basic advantages of bitstream and we therefore recommend a losslessly-packed linear PCM system to you for formal adoption.
References

1 'A Proposal for the High-Quality Audio Application of High-Density CD Carriers' Acoustic Renaissance for Audio, April 1995

2 'Digital Frontiers', HiFi News, vol.40, no.2, pp. 58-59 &106, February 1995

3 'High Definition Audio', Stereo Sound, Japan, April 1995

4 'Extended-definition Digital Audio Systems for High-capacity CD', Hawksford, M.O.J., IEE colloquium on Audio Engineering, pp 2-1 to 2-12, 1st May 1995, digest no. 1995/089

5 Vanderkooy, J. and Lipshitz, S.P., 'Digital Dither: Signal Processing with Resolution Far Below the Least Significant Bit' AES 7th International Conference - Audio in Digital Times, Toronto 1989, pp 87-96

6 Is the AES/EBU/SPDIF Digital Audio Interface Flawed?, Dunn, C. and Hawksford, M.O.J., 93rd AES Convention, San Francisco, preprint 3360, October 1992

7 'Digital-to-analogue converter with low inter-sample transition distortion and low sensitivity to sample jitter and trans-resistance amplifier slew rate', Hawksford, M.O.J., JAES, vol. 42, no. 11, pp 901-917, November 1994

8 'A Comparison of Two-stage 4th-order and Single-stage 2nd-order Delta-Sigma Modulation in Digital-to-Analogue Conversion', Hawksford, M.O.J., IEE Conference on Analogue to Digital and Digital to Analogue Conversion, Conference publication 343, pp 148-152, Swansea, September 1991

9 'The one-bit alternative to Audio processing and Mastering', Angus, J., Proceedings AES Conference 'Managing the bit budget', London May 1994

10 'Efficient filter Design for Loudspeaker Equalization', Greenfield, R. and Hawksford, M.O.J. JAES, vol. 39, no. 10, pp 739-751, November 1991

See ARA document [1] for further detailed references.

© Acoustic Renaissance for Audio 1995

Back to ARA Home Page

Click to expand...

If you mail me, I can send you much, much more of this material.

Click to expand...

Click to expand...

sully_2u · May 18, 2006

Can SACD 5.1 channel audio CD's such as DarkSide of the Moon by Pink Floyd be played on a Dolby 5.1 Surround Sound system? Or will it just be a regular CD quality music?

wilkes · May 19, 2006

No.
An SACD will NOT play on a standard DVD player at all.
It will play on a regular CD player in stereo - as long as it is a HYBRID SACD with a Red Book layer on it.
It will play on a universal DVD-Audio player, such as a Denon 2910, 3910 etc.
It will play on an SACD player - but will have to be hooked up with 6 analogue outputs tyo a suitable Surround capable AV amplifier/reciever.

But it will certainly not play through a Dolby Digital system - Sony, in their infinite "wisdom" went totally out on their own with SACD.

sully_2u · May 20, 2006

Will it still be in 5.1 surround sound if it just plays the regular cd portion of it?

wilkes · May 20, 2006

No.
On hybrid SACD discs, the CD portion is standard red book 16/44.1 stereo.
The only way it can possibly play in surround is if it is an Lt/Rt matrix stream, like Dolby ProLogic, ProLogic II or similar.
Otherwise it is straight stereo only off the CD layer.

sandt38 · Jun 15, 2006

I am not understanding why you express discontent with a media format producing noise beyond human perception. Not just slightly above it, but octaves above it, unless you are roughly 8 years old. By our 20s, we lose the ability to hear several KHz that we could percieve at perfect hearing, typically from 8 to 10... even still we are unable to percieve 23KHz at 8... so who really cares?

Also, I still think Pink Floyd's DSOTM in SACD is the finest representation of multi-channel audio I have heard. I have never been tremendously impressed by multi-channel music to begin with, as I believe my system captures and image depth and stage presance in pure stereo that seems manufactured with MCA. I have yet to be impressed with any DVD-A, where I do like the seperation and depth of real SACD. It has a more natural, less archived sound than DVD-A. Almost like the warmth of a tube amp, or a good vinyl on a solid Goldmund turntable.

Also, you mention Sony went out on a limb by forcing 6 independant audio outs and a capable reciever... Many of the first surround units also required the 6 channel feeds as they didn't have internal digital processors. Here again, I prefer the multi-channel bypass mode of my preamp compared to optical or digital coax since it offers a pure preamp processing bypass... Which is music to my ears.

djscoop · Jun 15, 2006

some people have better ears than others. some can hear about 25Khz, others can't hear beyond 15Khz. Even so, higher samplerate music increases the dynamic range of the audio, which is the difference between the lowest point of audio and the highest, increasing quality. its not all about a higher frequency range.

wilkes · Jun 16, 2006

I still think Pink Floyd's DSOTM in SACD is the finest representation of multi-channel audio I have heard. I have never been tremendously impressed by multi-channel music to begin with, as I believe my system captures and image depth and stage presance in pure stereo that seems manufactured with MCA. I have yet to be impressed with any DVD-A, where I do like the seperation and depth of real SACD. It has a more natural, less archived sound than DVD-A. Almost like the warmth of a tube amp, or a good vinyl on a solid Goldmund turntable.
Click to expand...

Then you've obviously never heard the DVD-A/V of DSOTM that is doing the rounds from Alan Parsons' original Quad mix.
everyone who has heard it agrees that it blows the SACD away.

My DVD-A player has a mode called Dource Direct, which bypasses all DSP, and sounds incredible.
How good any DVD-A is will depend on the mastering, and if the mastering engineer destroyed the mix with brickwall limiting crushing the living snot out of everything - as is so fashionable these days.
The format itself is IMHO far, far superior to the single-bit Noise shaped mess that is SACD.

sandt38 · Jun 16, 2006

Wilkes:

I have not heard that Disc, but if it is worth the money, I am open to anything... I just made a statement on my personal observations, which due to my initial disapointment in multi-channel recording I must admit is fairly limited.

My player also offers a multi-channel output for DVD-A, through which my pre-amp will allow direct passthrough. Strangely my DVD-As sound better through my Digital Coax though. In all honesty, being quite new to digital recording (outside quick burns on a CD) I am not terribly familiar with the limiting, and can only guess at the effect it has on the final product.

DJscoop

Dan Wiggins, Manville Smith, and Richard Clark all got involved in an interesting discussion on audible range. Typical optimum human perception is between 20 and 20,000 Hz. However, given time and age, and the noises our ears are subjected to, it has been concluded that typical young/middle adult (~30 years of age) hearing is roughly 35Hz-17KHz. The discussion actually started to center on subwoofer requirements in a room, but eventually led to a discussion on super tweeters crossed at 20KHz+. Interestingly enough, a blind test of well designed speakers using a super tweeter crossed at 18KHz on a 32Db/octave slope yeilded the responce of a few listeners to prefer the units not using a super tweeter as they found the speakers "less pleasant", although they could not determine why. There was no mention of harshness, or overextention (which interestingly enough overextention and a peak of bottom end frequencies at port tuning were noted... they were well versed listeners in the test). Most quality speakers are rated to a maximum of 20KHz flat, with significant rolloff at 20KHz for a reason. In fact, I prefer my builds to utilize a low resonance silk dome, or a fairly heavy ribbon tweeter (to realize the lower frequency range). Also, where room harshness is an issue I will notch the high end at 18KHz for a better in room responce.

Log in or Sign up

Can I play and rip a SACD/DVD-A disc

Garrycs Member

djscoop Active member

Garrycs Member

wilkes Regular member

diabolos Guest

wilkes Regular member

djscoop Active member

diabolos Guest

wilkes Regular member

djscoop Active member

diabolos Guest

wilkes Regular member

sully_2u Regular member

wilkes Regular member

sully_2u Regular member

wilkes Regular member

sandt38 Regular member

djscoop Active member

wilkes Regular member

sandt38 Regular member

Share This Page

Log in or Sign up

Can I play and rip a SACD/DVD-A disc

Garrycs Member

djscoop Active member

Garrycs Member

wilkes Regular member

diabolos Guest

wilkes Regular member

djscoop Active member

diabolos Guest

wilkes Regular member

djscoop Active member

diabolos Guest

wilkes Regular member

sully_2u Regular member

wilkes Regular member

sully_2u Regular member

wilkes Regular member

sandt38 Regular member

djscoop Active member

wilkes Regular member

sandt38 Regular member

Share This Page

Useful Searches