User User name Password  
   
Tuesday 7.10.2008 / 02:54
Search:        In English   Suomeksi   På svenska
afterdawn.com > forums > cdr and digital audio discussion > high resolution audio > can i play and rip a sacd/dvd-a disc
Show topics
 
Forums
Forums
Can I play and rip a SACD/DVD-A disc
  Jump to:
 
Posted Message
Garrycs
Newbie
_
27. March 2006 @ 11:59 _ Link to this message    Send private message to this user   
Hi all

Sorry if this against board rules
But I have the disc "Where the humans eat by Willy Mason" and it says on the disc that i need a CD/DVD-V/DVD-A/SACD player in order to play it. On the disc there is a player.exe that i don't want to install and it may be the only way to play the disc.
As i want to play the music on my ipod, how do i transfer the files by ripping. I can probably record the files the old fashioned way by playing and recording the disc which will take 40 minutes, but that's not on.
Can anyone help
Garry
AfterDawn Addict
_
27. March 2006 @ 21:16 _ Link to this message    Send private message to this user   
SACD discs cannot be read or ripped by computer optical drives.

But as long as it doesn't have any knarly copy protection, you should be able to rip the CD-audio portion of the disc. We here at aD recommend EAC (exact audio copy)...its the best ripper there is. You can find a guide on how to setup and use EAC in my sig below.

"I have no particular talent. I am merely inquisitive" - Albert Einstein




Use EAC (exact audio copy) to rip your CDs. Follow this guide:
http://www.afterdawn.com/guides/archive/mydeneaclame.cfm
Use dBPowerAMP to convert your various audio formats
http://www.dbpoweramp.com/dmc.htm
Garrycs
Newbie
_
29. March 2006 @ 02:12 _ Link to this message    Send private message to this user   
HI

Thanks for the suggestion, but all the files show up as data files and not audio files. So it looks like the files can't be ripped by EAC.
I read years ago that if a disc has 2 sections music/data, that you can put a line with a marker or a piece of tape across the data line and it only recognises the music. I did get this to work in the past on 2 cd's but i wouldn't like to try it on my own pc.

Anyone else got suggestions.
Senior Member
_
31. March 2006 @ 01:33 _ Link to this message    Send private message to this user   
Firstly, to prevent anything automatically running on disc insert, all you MUST do is disable the autorun functionality of your drive - and this is a good idea regardless, as there are so many discs out there that will install this crap even if you tell them not to.

Bad news is that you do NOT have any rights at all to rip these tracks for iPod use.



THE HUNGERCITY MUSIC TRACKER & FORUM

<!-- hungercity forum link -->
Senior Member
_
11. April 2006 @ 08:15 _ Link to this message    Send private message to this user   
So how do you use the SACD data tracks? Do you have to have a DSD compatibale player or plug-in? Or where you talking about the CD layer?

How to go around MediaMax...
http://forums.afterdawn.com/thread_view.cfm/233874

How to disable "Auto-Run"...
http://www.annoyances.org/exec/show/article03-018

Ced

My Afterdawn page:
http://my.afterdawn.com/diabolos/

My MySpace page:
http://myspace.com/ceddyb

HDMI "Manufacturer Faq" page:
http://www.hdmi.org/manufacturer/faq.asp
Xbox Live gammer tag: Ceddie

This message has been edited since posting. Last time this message was edited on 11. April 2006 @ 08:17

Senior Member
_
11. April 2006 @ 09:47 _ Link to this message    Send private message to this user   
There is no such thing as a software SACD player, or a computer drive that will read/Play them either.

All you can possibly do is extract the Red Book layer - and you don't want to do that as it will be smashed seven ways to sunday in an attempt to make the DSD layer sound better.

The only way to play the SACD layer is in an SACD player.
The Red Book layer should play in all players, although with the hybrid SACD there is a well-known problem with cracks from the spindle hole, and the more you play the disc the worse this will get.
Sony have no plans to remedy the fault.



THE HUNGERCITY MUSIC TRACKER & FORUM

<!-- hungercity forum link -->
AfterDawn Addict
_
11. April 2006 @ 12:32 _ Link to this message    Send private message to this user   
ah c'mon wilkes, one of these days you'll come around and start to love Sony. :) LOL

"I have no particular talent. I am merely inquisitive" - Albert Einstein




Use EAC (exact audio copy) to rip your CDs. Follow this guide:
http://www.afterdawn.com/guides/archive/mydeneaclame.cfm
Use dBPowerAMP to convert your various audio formats
http://www.dbpoweramp.com/dmc.htm
Senior Member
_
11. April 2006 @ 17:02 _ Link to this message    Send private message to this user   
Thats what I thought.
Quote:
...and you don't want to do that as it will be smashed seven ways to sunday in an attempt to make the DSD layer sound better.
That would be cheating... My Musik Soul Star CD/SACD Hybrid sounds great either way (too me). The surround sound track is the main plus.

Ced

My Afterdawn page:
http://my.afterdawn.com/diabolos/

My MySpace page:
http://myspace.com/ceddyb

HDMI "Manufacturer Faq" page:
http://www.hdmi.org/manufacturer/faq.asp
Xbox Live gammer tag: Ceddie
Senior Member
_
12. April 2006 @ 09:05 _ Link to this message    Send private message to this user   
Don't get me wrong - I have heard some great sounding SACD discs.
In particular, Roxy Music''s Avalon & Bryan Ferry's Boys & Girls are examples of how it can be done.
Trouble for me is that there are far, far more bad discs - and DSD/SACD has serious issues.
Ultrasonic noise is the biggie for me - all there is above 23KHz is noise, and lots of it.
But this is not a thread bashing SACD, SO I will stop now.



THE HUNGERCITY MUSIC TRACKER & FORUM

<!-- hungercity forum link -->
AfterDawn Addict
_
12. April 2006 @ 21:06 _ Link to this message    Send private message to this user   
so you still prefer the 24 bit dvd audio structure as opposed to the 1 bit sacd format?

"I have no particular talent. I am merely inquisitive" - Albert Einstein




Use EAC (exact audio copy) to rip your CDs. Follow this guide:
http://www.afterdawn.com/guides/archive/mydeneaclame.cfm
Use dBPowerAMP to convert your various audio formats
http://www.dbpoweramp.com/dmc.htm
Senior Member
_
12. April 2006 @ 23:10 _ Link to this message    Send private message to this user   
How does 1 Bit audio work anyway?

The SACD camp always talkes about how much 1-bit audio at very high sample rates is better than 24-bit at 192KHz. Is that true?

Ced

My Afterdawn page:
http://my.afterdawn.com/diabolos/

My MySpace page:
http://myspace.com/ceddyb

HDMI "Manufacturer Faq" page:
http://www.hdmi.org/manufacturer/faq.asp
Xbox Live gammer tag: Ceddie
Senior Member
_
13. April 2006 @ 01:26 _ Link to this message    Send private message to this user   
Not in my opinion, or that of the AES either.
As to how it works, Google will provide the answers.

This is a copy of a serious research document about DSD....
Quote:
Digital System Wars

More Evidence on Sony DSD/SACD

In IAR's 1998 Master Guide, we discussed a serious (we think fatal) sonic flaw in the Sony-Philips DSD standard, also proposed as a standard for their Super Audio CD format. That discussion was based on the evidence of one demonstration, a well executed A-B-R comparison conducted by Sony themselves at AES.
Since we published that article, we have had the opportunity to further evaluate DSD and SACD, in two further demonstrations, also conducted by Sony and Philips. All three demonstrations were very different in nature from each other, and on different kinds of systems. Thus, we now have three very different kinds of evaluations in our journalist's pouch as evidence.
Because these three evaluations are each different in nature, they draw an observational bead on DSD's performance from three different angles. It's like triangulating on a target, with three independent and different kinds of observations, taken from different angles. That's very important, since there's always a chance that observations in a single experiment might be faulty, as there might be an unknown peculiar fluke in the one experiment. But if you make independent observations, in three different experiments that are designed differently, then you are essentially looking at the same object from three different viewpoints. If all three independent viewpoints agree, you can be sure that the observed properties truly belong to the observed object itself, and are not merely a fluke of one observation vantage point nor a fluke of one experiment's design.
In this case, all three evaluations of DSD, in three different kinds of experiments, all agreed, and perfectly corroborated each other. They all revealed the same fatal sonic flaw. So the case against DSD and Super Audio CD is now even far stronger than before.
The second demonstration was conducted by Marantz (a high end division of Philips). This demo was based on CDs, rather than master tapes or computer hard discs. Thus its results are assuredly very relevant to what you could expect to hear from Super Audio CD in your home system. This demo was an instantaneous A-B comparison of exactly the same music, recorded onto two different CD formats, and played back from these CDs. The format pitted against Super Audio CD was not the true competition in today's world, the emerging CD standard from DVD-A, which allows 24/96 fidelity. Rather, this demo from Sony-Philips was showing off the alleged superiority of Super Audio CD to merely the ancient 16/44 CD standard. The Super Audio CD was played on a special CD player optimized for this new format, while the 16/44 CD of the same music was played through a standard Marantz CD player. Note that this put the 16/44 version under a bit of a handicap, since (as we all know) there are far better CD players that show 16/44 PCM CDs to better advantage than the Marantz. And, insofar as the SACD playback being optimal, one of Sony-Philips' chief selling points is that the playback circuitry is very simple and can be inexpensively optimized, as it presumably was in the special Marantz SACD player.
So, how did the new SACD format compare to the handicapped and ancient 16/44 CD in this direct A-B comparison?
In some sonic aspects, the SACD lost!! Above 8000 Hz the SACD sounded awful, especially on sibilants of the female singer, and on cymbal sounds from the drum kit. Whenever these musical notes came along, the ancient 16/44 PC CD sounded much cleaner, faster, and more open (remember, both CDs came from the same original master). The SACD exhibited a very trashy distortion on these musical notes, making them frazzled and smeared.
This gross distortion heard from the Super Audio CD version was identical to the sonic flaw we observed during Sony's earlier A-B-R demo using master tapes and studio processors, and occurred on the same types of musical notes. As we discussed in our 1998 Master Guide, this seems to be a slew related distortion, like a digital version of TIM.
This second demo confirmed our findings from the first demo, and it's an especially powerful confirmation because the system setup was so different. Moreover, since this demo employed the finished CD product rather than master tapes and studio processor loops, the findings of this demo are assuredly relevant to what you will hear from Super Audio CDs in your home system.
If the new Super Audio CD loses out even to the ancient 16/44 CD above 8000 Hz, you can well imagine that it will be slaughtered above 8000 Hz by 24/96 PCM CDs, including both the present ad hoc audiophile 24/96 standard on DVD video and the different forthcoming 24/96 DVD audio standard from DVD-A. And indeed we found this to be the case (see below).
In all fairness, we must also report that, below 8000 Hz, DSD and Super Audio CD sounded wonderful in this CD A-B demo, just as we found in Sony's earlier demo. The Super Audio CD sounds more open, airy, musically natural, and dynamic than 16/44 PCM CD below 8000 Hz; in direct comparison, the 16/44 CD sounded more canned, glazed, constricted, and closed in.
As we discussed previously, this means that the basic principles behind Super Audio CD are valid, but that the sampling rate is not nearly high enough to support the higher frequencies of the audio spectrum with decent fidelity. In a 1 bit system like DSD-SACD, a very high sampling rate is required in order to handle music to 20,000 Hz, and to handle steep, high slew rate musical notes such as vocal sibilants and cymbal sounds. The present DSD-SACD sampling rate is only good enough to cover music up to 8000 Hz. This is simply unacceptable as a high fidelity medium. It's like having a speaker system without any tweeter. Actually it's even worse than that, since a speaker system without a tweeter would merely sound dull, and would not actively distort treble information, while DSD-SACD does grossly distort music's trebles.
Many listeners react favorably to the sound of DSD-SACD. They are obviously so entranced by the improved musical naturalness below 8000 Hz that they fail to notice the gross distortion above 8000 Hz on certain musical notes.
The third demo was Sony's current professional road show, for studio engineers. This was a single ended demo, with no A-B comparisons. It's worth reporting on because it showed off DSD to its very best advantage. The playback system included Sony's own very revealing speakers, and the source was as good as it gets, a studio master hard disc. Thus, we were treated to the very best possible sound of DSD, coming directly off the master recorder.
How did this sound? Again, up to 8000 Hz the sound was wonderful: open, airy, natural, and dynamic. But again there were severe sonic flaws above 8000 Hz, especially on musical notes requiring a high slew rate. One revealing track was an a capella chorus. Every sibilant was grossly mangled.
This mangling showed that DSD did a number of things wrong, which are worth a brief analysis. A live vocal sibilant is supposed to sound like clean, open white noise, like a jet of escaping steam. Try saying "ssssss" and listen to the sound. Notice that your teeth are bared, with your lips pulled back. Now say "moon", and then say just the "ooooo" part of "moon". Notice that your lips are cupped way forward, and are cupped into a circle. Next, say "ssssss" again, but this time force your lips into the same forward circular cup as they had while you were saying "ooooo". And finally, continue to say "ssssss" while moving your lips between this forward, cupped position and the pulled back teeth bared position. Notice that the sound of the "ssssss", your vocal sibilant, changes character drastically as you move your lips back and forth between these two positions. In the natural position, with lips pulled way back and teeth bared, your sibilant has a bright, open, white noise sound. This is what a live vocal sibilant sounds like, this is what an accurate recording should sound like, and this is what good PCM digital sounds like (both 16/44 and 24/96). In the artificial position, with your lips cupped forward, the pitch of the same "ssssss" sibilant drops, the sound is duller, the sound no longer has its natural spectral balance (the open, bright white noise sound of steam escaping), and the sound is closed in rather than open (as if it were trapped in a tunnel).
This is what DSD did to the vocal sibilants of the chorus in this master recording. Whenever a vocal sibilant came along, the pitch apparently dropped lower, as if the singers had cupped their lips forward while singing every sibilant.
DSD also mangled these sibilants in other ways. Try saying "ssssss" again (normally, with lips back and teeth bared). Notice that the natural sound consists of lots of little spikes of individuated noises. The only reason that you can hear these noise spikes as individuated, and subtly different from each other, is that there are instants of relative intertransient silence between the spikes. Now try saying "shoosh". Notice that the "sh" sound smears the spikes together into a more homogenous sound, and that there are no longer individual spikes of noise with high peak amplitude.
DSD does this same kind of mangling to sibilants. It reduces the amplitude of the individual peak spikes of noise, and smears the energy over time, filling in what should be intertransient silence between spikes. DSD might have excellent dynamics at lower frequencies, but in the trebles it sonically acts as a dynamic compressor, squashing the peaks. DSD then sonically takes this lost dynamic peak energy and smears it over time, filling in the spaces between transients so that the transient sounds lose their individuality, instead becoming blended and smeared into a homogenous slur. DSD changes "ssiss" into "shoosh".
This mangling of vocal sibilants was striking on the master recording of the a capella chorus, because the recording was so superb at lower frequencies, and because there were no other instruments playing at the same time that might have masked this mangling. We heard this mangling, and another audio pro at this same demo also heard it, being bothered enough by it to speak up about it to others.
Why should DSD-SACD have a too-low sampling rate problem, that leads to these fatal sonic flaws above 8000 Hz? After all, this is a studio mastering and archiving system, which is supposed to have data capability even beyond any consumer distribution medium. And this system is being born in the age of high density laser discs (such as DVD), with ample storage to support high sampling rates.
DSD's too-low sampling rate is even more puzzling, and more shocking, when we look at a bit of audio history. Philips was one of the pioneers of noise shifting, i.e. time averaging of oversampling, a technique which allows fewer bits to do the work of more bits, at least for lower frequencies where there are enough samples to average. In their first application of this technique, Philips reduced the bit resolution only a slight amount, from 16 bits to 14 bits, and they offset this slight resolution loss by oversampling by 4 times, at 176 kHz instead of 44 kHz. This was an equitable tradeoff of information content, with 4 times less resolution traded for 4 times geater bandwidth (although not a perfect tradeoff, since the time averaging failed to offer genuine 16 bit resolution at music's highest frequencies).
Then, some years later, Philips was trying to find a way to build really cheap CD players for budget consumer systems. They came up with a really cheap chip set by reducing the bit resolution from 16 bits all the way down to 1 bit, and they called it Bitstream. With such a large reduction in bit resolution, the oversampling should have been increased to 32,000 times, if they wanted to preserve an equitable tradeoff of information content (to preserve basic information content, the sampling rate should be doubled for every bit dropped from resolution). But Philips didn't do this. Instead, they increased the oversampling to only 256 times the nominal 44 kHz (thus providing 1 bit sampling at 11.3 MHz). Why such a compromise, of only 256 times oversampling instead of 32,000 times oversampling? Remember that this Bitstream system was intended only for the cheapest consumer CD players. It was not intended to even replace Philips' own more expensive multibit consumer CD players. And it was most certainly not intended to become a studio mastering and archiving system. Note that this was over 10 years ago, when the state of the digital art was far more primitive than it is today, and digital media did not have the large storage capability to support the high sampling rates that today's media do.
So, before we go forward, remember and keep this key fact in mind: over 10 years ago, when digital was primitive and storage media limited, Philips designed a compromised 1 bit system for only the cheapest consumer CD players, and they still gave it 256 times oversampling as a sampling rate.
Now let's fast forward to the present. Now we have more sophisticated digital systems, and digital media with much higher storage capability and faster transfer rates, so we can engineer and we can afford higher sampling rates than we could 10 years ago. Now we see Philips and Sony collaborating on a new digital standard which is not intended as just a compromise for the cheapest consumer CD players, but also for the best consumer CD players, and also even for the holiest of holies, studio mastering and archiving of music for generations to come (which obviously merits the very best possible fidelity, without compromise).
Naturally, from all these considerations, one would expect that this new standard would have a much higher sampling rate than the compromise system developed 10 years ago only for the cheapest consumer CD players. One would expect therefore that DSD-SACD (also a 1 bit system)would oversample at some rate much higher than the 256 times of that ancient Bitstream cheap consumer compromise.
So, how much higher, how much better, than 256 times oversampling, is the oversampling that Sony and Philips have put into DSD-SACD, the modern new mastering standard for the ages? Is it perhaps 512 times oversampling, twice as good? Is it 1024 times oversampling, 4 times better?
No.
It's actually 64 times oversampling, which is 4 times worse!!! DSD-SACD, the modern new mastering standard for the ages, samples music at only 1/4 the sampling rate used 10 years ago by Philips' own Bitstream, intended only for the cheapest consumer CD players of those primitive ancient times. Bitstream's 1 bit system sampled at 11.3 MHz, but DSD-SACD samples at only 2.8 MHz.
Remember that Bitstream's 256 times oversampling was already a compromise for cheapness. If Bitstream were to have preserved the same information content as the 16/44 multibit CD player, it would have to have been given an oversampling rate of 32,000 times.
You'd think that any move toward mastering quality, and/or toward modern digital standards and capabilities, would require an oversampling move to a higher number that would at least equal this 32,000 times (which would make it the informational equivalent of 16/44 multibit). But Sony-Philips didn't make DSD better than Bitstream, or equivalent to 16/44 multibit. They didn't even make it equal to Bitstream. Instead, they made it worse than Bitstream. Four times worse! What a travesty!
No wonder DSD-SACD has such problems mangling music's high frequencies! It's a giant step backwards in sampling rate, down to a sampling rate that is simply too low to accurately capture music's fastest waveforms with a 1 bit system
And also this:
Quote:
http://www.helsinki.fi/~ssyreeni/texts/bs-over/bs-over.en.html
Copyright © 1996–2002 Sampo Syreeni; Date: 2002–09–17
Oversampling and bitstream methods in audio

Through the relatively short history of digital audio processing, the technology has improved by impressive steps. Nowadays most problems which plagued early digital applications have all but vanished. For most of this we have only two inventions to thank: oversampling and bit reduction. This article gives a short introduction to these important topics. It also presents some of my views on using the resulting bitstream methods in audio transport applications, of which Sony’s Super Audio CD (SACD) is the first and foremost example.
Sampling basics

Oversampling and bitreduction techniques are mostly a matter of implementation—in theory, neither of them are needed to build robust, theoretically sound audio processing applications. On the other hand, both these technologies are based largely on the same principles as are the more classical incarnations of digital audio. Especially one needs at least a cursory understanding of sampling, reconstruction and the management of noise in audio systems.
The sampling theorem

The sampling theorem, in its present form, was formulated in the 20’s to 50’s by the same people that developed the information theory—mostly Harry Nyquist and Claude Shannon, both employees of Bell Laboratories. The sampling theorem is the theoretical basis that allows us to process physical signals on discrete, digital computers in the first place.

What the sampling theorem says is, under certain conditions we can convert a continuous, infinitely accurate (analog) signal into a stream of time equidistant samples and lose no information in the process. The condition is, there is to be no content in the analog signal above or equal to half the frequency we are sampling on—the signal we sample is bandlimited. The conversion consists of taking the instantaneous value of the analog signal at regular intervals, determined by the sampling frequency. Note that nothing is said about the practical method used to achieve such infinitely narrow samples (only the value at a single instant of time affects the resulting number) or the number of bits in a sample (in the theory, a sample is a real number, i.e. it is infinitely accurate). Something is said about how to reconstruct the analog version from the resulting samples—after all, losslessness means being able to return the signal to its original form exactly.

Perfect reconstruction, as it is logically enough called, is achieved by passing the point samples through a perfect lowpass filter. Such a filter cuts off everything above half the sampling frequency. Of course, this kind of idealized response is not physically achievable, just like in the sampling side we have difficulty with obtaining very thin samples. Now, seeing the reconstruction step as a lowpass filter is not very instructive either. But seen in another way, it makes perfect sense: in essence, we are interpolating between the sample points. The ideal lowpass filter responds to one such input sample by emitting a sin(x)/x shaped signal—an oscillating function that dies out relatively slowly as we go farther away from the time of excitation. More specifically, the prototype sin(x)/x function is zero precisely an integral number of sample periods removed from the origin, unity at zero and symmetrical about it. Scaling this prototype by sample value, shifting the response from origin to center it around the exciting sample and then summing the responses we get from individual samples, we have a signal that agrees with the original at sample times and varies smoothly in between.

One might wonder what it means, exactly, to pass the sampled version through a lowpass filter—after all, the digital and analog domains are very different and it is not entirely clear what filtering in each means. And indeed it requires some real math to exhaustively understand that. For those who have a hunch, the sampling process consists of taking an inner product of a signal with a time-shifted delta function (Dirac’s d-distribution) and reconstruction of summing time-shifted, scaled by sample value copies of the impulse response of the reconstruction filter. Which of course is equal to putting out infinitely thin impulses through the filter, with energy proportional to the original point samples. Because impulses (delta-functions) contain energy at all frequencies, the filter then removes all the extraneous stuff above the Nyquist limit. This is why the output filter is called the anti-imaging filter—the extra stuff that is removed consists of frequency shifted copies of the original signal, images.

To process signals digitally, they will need to be bandlimited. If this condition is not fulfilled, aliasing will occur. This means that there will be only frequencies below half sampling rate in the reconstructed signal, and that any content in the input signal with frequencies above half the sample rate will fold into the admissible band. For instance, at 40kHz sampling rate, the extraneous 2kHz in a sine wave of 22kHz will cause it to fold to 18kHz. Aliasing does not sound nice and is to be avoided at all costs. This means that we need to guarantee no inadmissible content is present in the sampled signals. This is achieved by passing the signal through a lowpass filter, the anti-aliasing filter, before sampling. Again the math assumes the filter to be perfect and this is not physically achievable.
Pointlike sampling and S/H

Anti-aliasing and anti-imaging aren’t the only problems that we encounter. Even such a simple operation as taking point samples is surprisingly difficult in practice—the incoming electrical signals roam throughout the sample period. The natural thing to do, then, is not to sample the signal directly, but put a kind of gate circuit in between. These circuits are called sample and hold or S/H. Such a circuit usually operates by sampling the incoming voltage to a capacitor and then switching (through the use of a couple of MOSFET transistors) the capacitor over to a high input impedance amplifier (an operational amplifier with a FET input stage) for sampling. This is better but still it is not optimal—any change in the capacitor’s voltage calls for a change in charge and such a change consumes energy. This energy change has to take place within the brief sampling interval and so circuit resistance, the capacitor’s capacitance and the finite operating voltage of any physical circuit bound from below the time accuracy of the sample and hold function. We also have to worry about charge leakage, circuit linearity and the considerable noise and heat introduced into the circuit by such rapid current flows.

A share of S/H helped alot in the implementation of older A/D converters. However, in the output side we still have problems left: there even correct instantaneous voltages are not enough. The ideal solution requires true impulses which, of course, are not even remotely achievable in the physical reality. If we decide to make do with less, distortions creep in: a S/H circuit in the output of the converter will allow the conversion process to settle to the right voltage without rippling the output but will also produce a staircase waveform instead of a train of scaled impulses. This is, in effect, a time variant linear filtering operation and produces frequency anomalies (the output becomes the ideal one convolved with a sampling period wide pulse which leads to high frequency attenuation—a pulse has a decaying, rippled spectrum instead of the flat unity of an impulse). This is why the operating voltage of converters strictly bounds naïve implementations like the one discussed above. We also get the same energy constraints that we had above, so the output will necessarily become a high impedance one—this is obviously bad from a thermal noise point of view.
Practical anti-aliasing and anti-imaging

Given that the theory is based in perfect lowpass filtering, it seems that imperfect physical filters pose a significant problem. Indeed, since we are talking about conversions, all these filtering operations would seemingly have to be implemented in the analog domain. Next we take a look at some of the problems associated with analog filters.

The first challenge is the amplitude response of our filters. From the theory of linear filters we know that the ideal brickwall filters can only be approximated by physical filters. This goes for both analog and digital implementations. To get a near approximation, the filters will also need to be of high order—in older digital audio systems the order of the analog input and output filters could exceed ten. This automatically leads to noise and stability problems, especially since the best responses require elliptical filters which are known to be quite sensitive to parameter fluctuations. The cost of implementation is quite high and the knee between passband and stopband will always be quite broad. This means that the upper part of the audio band will need to be sacrificed if correct behaviour in hostile conditions is desired.

Even if our filters now have a perfectly acceptable amplitude response, we are not done yet. This is because when high order analog filters are used (and especially elliptical ones), the phase response of the filter becomes exceedingly bad near the cutoff frequency. This means that the filter will ring, i.e. go into damped oscillation near sudden, transient signals. Consequently the time structure of the sound will be somewhat blurred near the cutoff frequency. This is an unavoidable consequence of analog filtering and is usually the reason given for the early CD players’ bad performance. (The bad rep from this era may be why many audiophiles still shun CDs.) Since we still need to limit the incoming audio band, the only real solution would seem to be using a higher sampling frequency so that any phase distortion the filter might cause ended outside the audible band. This is not very nice, though, because wider bands mean wasted space on storage media and also more expensive electronics to implement the system.
Conversion linearity

We’ve seen already that point sampling and anti-imaging/aliasing are easier in theory than in practice. But how about the actual conversion step, the one that takes in voltages and puts out numbers? It should come as no surprise there are problems here, too.

There are at least three major ways to implement the conversion, none of which are perfect. The most straight forward is flash conversion: to convert we generate a reference voltage for each possible conversion value and compare these in parallel to the input voltage. Then we take the highest lower than the input voltage and output the corresponding number. For D/A, we just output the correct reference voltage. This approach doesn’t scale far beyond 12 bits. The second way utilizes the fact that D/A conversion is generally easier than A/D—we approximate the input voltage by setting the highest unknown bit in the output number, compare a D/A’d version of the current number with the original, pick a value for the bit that makes the current number less than the input value and loop for all bits. This is called successive approximation. The method scales well, but is not very fast. We also depend on the accuracy of the D/A step involved. The third way is dual slope conversion—instead of level comparisons with reference voltages, we use the input voltage to drive current to a capacitor through one resistor and then discharge it through another. The time it takes for these two slopes to complete can be measured very accurately and the process is highly dependable. The problem is, it is also extremely slow and so isn’t suitable for audio sampling rates.

Now, from the above we gather that older conversion methods rely on the ability to generate accurate references. The usual way to do this is to use resistor networks and constant current sources. This is also where we get into trouble. Current sources cannot be made infinitely accurate and they always suffer from, e.g., temperature drift. On the other hand, resistor networks rely on accurate values of the resistors (in the best ones, such as the R-2R ladder, on equal values of all the resistors involved) but these are quite difficult to achieve. This means that the reference voltages and D/A conversions achieved through resistor ladders have small variations between the sizes of adjacent conversion steps. Sometimes it may even be that the steps are not even monotonous—a higher digital input value might produce a lower output voltage, for instance. This is very bad since it destroys the linearity of the converter. And when this happens, there will always be distortion. The step size variations, dubbed differential nonlinearity, are difficult to correct without expensive manufacturing techniques. They also lead to converters which perform worse than their width in bits would suggest: an 18-bit converter with some differential nonlinearity might have the S/N ratio of an ideal 15-bit converter. Not to mention that the errors generated can be strongly correlated with the signal and thus easily discernible in the output. All this gives a good reason to try and avoid multiple independent reference voltages and architectures with narrow manufacturing tolerances.
The predominant solution

The mix of problems described above is nowadays solved with a standard bag of tricks that we intend to look into, next. This bag includes oversampling, digital filtering, bitwidth/bandwidth tradeoffs, noise shaping and delta-sigma conversion.
Oversampling and digital filtering

All the problems with the anti-aliasing and reconstruction filters described above are at some level linked to the fact that the filters are analog. In contrast with analog ones, digital filters can have perfectly linear phase response and arbitrarily high order without significant noise problems. Digital filters do not suffer from thermal drift, either. Given that numeric processing is quite cheap nowadays, we would ideally like to perform our filtering operations in the digital domain. But because we are talking about how to convert from digital to analog and vice versa, this would seem to be impossible.

The way to get around this little dilemma is actually very simple: we share the burden between the digital and analog domains. A low order analog filter can be used to guarantee that no significant content is present above some (rather high) frequency. If our sampling process works at a rate of at least twice this higher limit, we are left with a sampling chain with a relatively poor amplitude response in the higher part of the spectrum. The lower portion, however, can be quite usable. We can now use a digital filter to further limit the band to this usable portion without introducing any analog artifacts like phase distortion. We can actually use the digital filter to partially compensate for the imperfect amplitude response of the analog input filter.

Compensation of this kind will bring up noise components in any band that might require amplification or conversely degrade the dynamic range of bands which are attenuated. This is why we might wish that only attenuation is used and start with a higher dynamic range sampling process than the actual bit depth we are aiming at.

Now, given that the original sampling rate was high enough (typically 64+ times the final sampling rate we wish to use) we are left with a high sample rate digital signal with most bandwidth unused. We can now resample this signal to achieve our (much lower) target sampling rate. We have used oversampling to enable digital processing. Since we often downsample by an integral amount (like 64 times), it is even possible to combine the downsampling process with the filtering step, creating a very efficient computational construct called a decimating filter. For digital to analog conversion, a symmetrical structure is used which interpolates to a higher sampling rate.

There are also further benefits to the oversampling process. First of all, the problems associated with sample and hold are diminished. This happens because we are using a much higher sample rate and thus much shorter S/H periods—essentially the filtering operation imposed by staircase formation will now have a response which is almost constant over the audio band. Furthermore, since the analog lowpass filter guarantees that the signal cannot wobble very much during a single sampling period, it may even be possible to dispense with the S/H step altogether. Secondly, the analog filters used can have a very low order. Not only does this mean that the response will be almost constant over the target band but also that the filter will have an excellent phase response. (The problems will be outside the audible band.)

The traditional way to explain the substantial benefits of oversampling is through the realization that regardless of sampling rate, the bit depth of a converter determines the amount of quantization energy inserted. This is because the magnitude of the error signal stays constant while sample rates vary under the assumption that the error is uncorrelated with the signal. If we raise the sample rate, the power of the error signal is spread out in frequency and removing part of the bandwidth with a filter will lower the overall power of the error signal. I find this explanation difficult to understand. The twist is, it seems like this procedure brings us conversion which can be more accurate than the incoming sample stream. What really happens, though, is that the staircasing which results from S/H in traditional converters gets diminished—low amplitude signals are produced with more average accuracy over a sample period because of the higher sample rate and the stairs get rounded by the output filters. This means that even a single bit on-off fluctuation in the PCM data stream gets decoded, correctly, into a sinusoid. In essence, the oversampling makes it possible to reconstruct the signal mainly in the digital domain and the oversampled output with its analog filter is really just a way to deal gracefully with the extra bits produced by the digital anti-imaging filter. All in all, we get a very low noise floor for conversion error, but the inherent resolution of the sampled PCM stream is in no way surpassed. With proper dithering in the A/D chain, however, the increased decoding accuracy makes it a lot easier to hear material which is actually below the noise floor of the quantization plus dithering operations. Oversampling combined with digital filtering opens a way to nearly perfect A/D/A conversion and so it is a very important tool to any builder of audio systems.
Bit depth reduction and noise shaping

After oversampling is employed the filter troubles go away. However, the problems with the conversion step itself are made a couple of orders of magnitude worse. This happens because very accurate conversion is even more difficult to achieve at high sampling rates. In fact, at the megahertz rates required for 64 times oversampling architectures, traditional converters of over 12 bits do not really exist. Nonlinearity isn’t going to get any easier to handle, either. This is why we might wish to do with fewer bits. Digital filtering generates bits, too, so it would be very nice to somehow drop the extraneous ones. But this automatically means quantization noise will get intolerable, right?

In general, yes. But remember that we are talking about an oversampling architecture, now. Here only the lowest 1/64 of the total bandwidth is of real interest. What happens above that is not of real concern since we will not be able to hear it and, more importantly, the analog input/output filters will attenuate those frequencies progressively more in the higher bands. This means as long as we keep the in-band noise in check, we can increase the out-of-band noise level considerably. This is achieved through what is appropriately enough called noise shaping.

The simplest digital noise shaper consists of a quantizer (in the digital domain this just takes a fixed number of the high order bits of a digital word) and a subtraction circuit which subtracts the quantization error introduced into the previous output sample from the current input of the quantizer (in effect subtracting the neglected low order bits of the previous input word from the whole current input word). It is not very easy to see why this circuit does what we described in the previous paragraph. To get a picture of what happens, we must change the configuration described a bit. First of all, we can express the error (which we currently feed back) as a difference between the quantizer input and output values. Then we can separate these two into signals which are subtracted and added, respectively, in the quantizer input. After that it is easy to see that the original configuration corresponds to a very simple IIR filter followed by a quantizer whose output both serves as the final output of the circuit and also feeds back to be subtracted from the input signal. Now, assuming the quantizer is just an additive source of noncorrelated noise (this is a fairly good approximation over a wide range of operating conditions and amounts to linearizing the circuit), it is quite easy to see why the loop behaves the way we expected: the circuit approximates a closed loop linear filter with an embedded source of white noise. The spectrum of the noise is determined by the closed loop response of this filter, and is easily evaluated. In the simple case, the inner IIR filter has a first order lowpass response, so the closed loop response of the outer loop is a highpass one—the noise is inserted primarily in the higher frequencies. Furthermore, the structure of the circuit as a whole guarantees that at low frequencies, the spectral structure of the output will closely match the one of the input with very little quantization noise present.

A comprehensive analysis without separately linearizing the circuit is exceedingly difficult because the circuit is a nonlinear one with feedback—at worst, circuits like this could go into chaotic oscillation. In my mind, this is one of the prime reasons why bitstream techniques should only be used to implement parts of the signal chain—in the absence of a complete theoretical framework, bit depth reduction and noise shaping should not be incorporated into audio systems at an architectural level.

In general, the inner filter described above can be substituted with a much more complex one—this gives us a way to control the spectrum of the quantization noise. This way we can make sure the resulting noise is adequately low over the audible band and, above that, sufficiently attenuated by the filters employed. Furthermore, we may want the noise remaining in the audible part of the spectrum to be well matched to the threshold of hearing so that it will not be heard as readily.

The last part is especially important when there is no possibility of oversampling. Playback of 16-bit sound on an 8-bit soundcard or mixing multiple 16-bit streams for 16-bit playback are two very common applications.

It is clear that the fewer bits there are to worry about, the easier it is to design a working converter for them. So far so good. But the most striking benefit comes when the process is carried out to its logical conclusion to yield one bit processing. A one bit converter, in addition to being extremely simple to implement (one bit D/A is a switch, one bit A/D is a comparator) will suffer from zero differential nonlinearity—there is only one step so all the steps will obviously be of identical magnitude. Some slew rate distortion and a constant offset (resulting from imperfect supply voltages) are practically the only problems of the single bit converter. The slew rate issue can be handled by some careful design and constant offsets rarely matter in a mostly capacitively coupled audio signal processing environment.

Finally, the above description of noise shaping lends itself to both digital and analog implementation and the method is applicable to A/D conversion as well. These two factlets are all we need to arrive at today’s prevailing audio conversion concept, delta-sigma conversion, the topic of the next chapter.

The name delta-sigma conversion comes from the traditional Greek letters used for denoting differences and sums. In the first order noise shaper we introduced first, the inner feedback loop sums (accumulates) successive input values. In the digital domain this is just a running sum, in the analog this is an integrator. In analysis, the value of such an accumulator is commonly denoted by sigma (S), meaning sum. The output of the modulator, on the other hand, is a coarcely quantized difference between successive values of the accumulator—a delta (?) precisely in the sense used when we speak e.g. about delta modulation or delta coding. So what we send are deltas of sigmas, whence the name. This view of the modulation process will also prove useful later on.
Delta-sigma conversion

A beloved child has many names. So does this conversion method: delta-sigma, ?S, sigma-delta, MASH and charge balance conversion are but a few. But the basis is the same—we employ a huge oversampling ratio (usually 64 times the target sampling rate) and aggressive noise shaping to bring the converter down to the one bit regime. In the A/D side we implement the noise shaping circuitry in analog form (the subtraction is an opamp based differential amplifier, the A/D converter is a comparator, the filter is a continuous time or switched capacitor analog one and the feedback loop holds a switch to convert back to the analog domain), in the D/A side we mostly employ digital processing (only the final bitstream is converted).

In addition to the reasons outlined in the previous chapter, delta-sigma conversion has a very persuasive further benefit: it is very cost-effective to implement. This is because the technique does not rely on any precision components (unlike the other methods which require resistor ladders and precision capacitors), is easy to embed into otherwise digital circuits (using CMOS logic and switched capacitor filters, the design nicely straddles the digital/analog boundary) and is repeatable unlike any other (the digital filters are always accurate and the few analog flaws can be ironed out through autocalibration). Further, delta-sigma methods are the only to reach reliable 20+ bit performance at audio sampling rates, a noteworthy fact in an age everybody’s already got 16 bits in CD.

And now on to the downsides. Delta-sigma (bitstream) methods are nice, but they’re not without their problems. We will now delve into those.
Nonlinearity, idle tones and dead bands

Above, when we tried to figure out why the noise shaping circuit did what it was supposed to, we resorted to linearizing the circuit. A hint was given that this wasn’t perhaps the best way to go. And so it isn’t—the linearized circuit does behave nicely and also approximates the actual quantizer performance quite well. But there are occasions when the true nonlinear nature of the circuit crops up. And these circumstances arise in practical converters, as well. The three major problems are idle tones, dead bands and nonlinear unstability, and they tend to plague delta-sigma modulators of higher order. This is unfortunate since the higher the order of the modulator, the higher the potential performance at a given oversampling ratio and converter bit depth.

Idle tones, in system theory dubbed limit cycles, are a mode of nonlinear oscillation. They exemplify the exact opposite of one of the basic properties of linear systems. In the absence of input (i.e. given an input of all zeroes from some point of time), the output of a stable linear system always approaches zero. Of course, the convergence can be slow but it nevertheless happens for all linear systems—it is easy to show that if it doesn’t, the system cannot be stable. But not so for nonlinear ones. Idle tones are one of the consequences. They are stable sequences output by a nonlinear network in the presence of prolonged null input. In delta-sigma architectures they most often occur at very low amplitudes, just when the output of the modulator should go to zero. Needless to say, our ears can easily pick these sequences up, even if they are below the noise floor. Idle tones are heard as faint, whining noises in converter output. The exact time domain behavior of the modulator after entering a limit cycle depends on the structure of its state space, and most importantly the amount of modulator state. As the latter grows with modulator order, it is not a surprise that higher order modulators are the first to be affected.

Intuitively, this is because the more state there is, the more things there can be happening inside the feedback. In the context of delta-sigma modulation, one viable perspective is that as the order of the filter grows, so does the maximum achievable phase shift. This means the extra spectral content inserted by the quantizer has more time to evolve before hitting the nonlinearity again. Hence, longer wavelengths, more action within one round through the modulator and eventually maybe even bifurcation. One further way to understand the oscillatory modes arises from the observation that steep roll-off lowpass filters necessarily ring near transients. The higher the order, the more ringing there is. Combined with the high group delays incurred by high order filtering, this ringing easily leads to oscillation when the output is fed back through the quantizer.

Now, the state is usually rather limited and the nonlinearity in this case is (in some intuitive sense) rather regular. That is why the possible modes of oscillation tend to be at least quasi-periodic and mostly have a short period. When the input signal dominates the circuit, the lineariazed noise source analysis tends to hold so the problems mainly appear at low amplitudes. Summarizing, the resulting tones will have a high pitch, a low amplitude and a definite pitch. Idle tones are considerably more annoying than mere noise or some differential nonlinearity. This is why they must be avoided at all costs. The problem is, controlling this sort of nonlinearity analytically is exceedingly difficult. The most common way to deal with limit cycles, then, is to insert small amounts of noise into the modulator feedback loop to at least disperse the pitches generated and, in the best case, drive the modulator out of the limit cycle into a zero-convergent region of the state space. Notice that this is definitely different from dither, which is used in the input of the converter to decorrelate the total quantization noise from the signal.

Dead bands are a concept closely related to idle tones. They denote parts of a nonlinear system’s state space (other than all zeroes) which captures the system—if the system enters such a band, it will not leave it in the absence of input. The concept of a dead band is more comprehensive than the one of idle tones since it can include nonoscillatory behavior (the output decays to within one unit of zero and stays there) and some gross nonlinear oscillatory modes (like the ones resulting from clipping and, especially, overflow). The concept is really applicable only to circuits which over some fairly broad set of operating conditions closely approximates a linear system, and in such settings is very useful for explaining some of the typical solution strategies which are used to bring the system back on track. For instance, the breaking of idle tones by inserting noise can be thought of as an attempt to nudge the system out of a dead band.

That dead bands can arise at high amplitudes as well is rather troubling. Indeed, at least the analog variants of fourth order and beyond delta-sigma modulators manifest some rather troubling behavior when the values output by the loop lowpass filter (integrator) grow large. This is why the whole theoretical operating range of these designs is rarely utilized and some circuitry is often embedded to detect high amplitude unstability and consequently reset the modulator. An operation such as this seems quite drastic but is rarely needed in a properly amplitude controlled system. At high inputs the effects on the output usually cease in a single target sample period or less, as well. But still, the necessity of such emergency measures does not exactly serve as a corroboration of the theory behind delta-sigma conversion.

One way to understand the range limitation is to consider how the filter is driven in the time domain. Here we have a high order filter which is driven by full scale digital inputs. When we are operating sufficiently near null sigma values, the possible inputs are placed approximately symmetrically around the sigma and the input bitstream will be well balanced with lots of high end and very little anything that would resemble time information. Near maximum sigma the equilibrium is attained when most of the input pulses have the nearer digital full scale as their value and only occasionally there is an opposite, wildly differing pulse of the opposite polarity. Now, for a filter to be of high order, means to react through higher order momenta than just the first one and possibly to ring. Long stretches of constant input will in a way load the momenta and if overshoot then happens, the delayed string of corrective, opposite pulses may perpetuate the effect. Also a very good guess would then be that as the analysis behind noise shaping does not consider phase information and instead relies on a stochastic linearisation of the modulator loop, regular and/or unbalanced time structure in the input of the filter would not be a good thing. Near full scale this is precisely what is produced—inputs with non-zero mean quantization errors in the loop and near deterministic time behavior.
Why go to higher bit depths?

In addition to concerns of economy, conversion and bit depth requirements constitute the major drive behind bitstream methods. It is understandable that people continuously strive for more accuracy. After all, not many people would mistake a recording for the original performance under any realistic conditions. But few people question whether going to ever wider converters and higher sampling rates is really the way. Contrary to common audiophile rhetoric, there is quite a bit of reason to believe we are already quite near the limit beyond which increasing bit rates make no difference to the human observer.

Based on what is known about hearing, people do not truly hear anything beyond 25kHz. And even this is quite a conservative estimate, since it primarily holds for isolated young adolescents. And even if some people do hear frequencies that high, the information extracted from the ultrasonics is very limited—there is some evidence that everything above some 16kHz is sensed purely based on whether it is there, irrespective of the true spectral content. As for dynamic range, research suggests that 22 bit accuracy should cover the softest as well as the loudest of tones over the entire audio bandwidth.

But these limits are not the end of story. If we are simply aiming for a good audio distribution format, some extra processing can yield significant benefit. This is because pure, linear PCM storage in no way employs the peculiarities of human hearing. The dynamic range and lower amplitude limit of our audition varies considerably over the audio bandwidth. Two known methods or employing this variance are noise shaping and pre/de-emphasis. The first uses the above described noise shaping principles to move the quantization noise generated at a given bit depth from sensitive frequency ranges to less sensitive ones, in effect giving more bits to the ranges which most need them.

This is a nice example of a nonoversampling application of noise shaping techniques and is in use right now: at least Sony markets its records as being super bit-mapped.

Noise shaping has the benefit of only being needed in the production side of the signal chain. It shapes the noise floor of the recording and so alters the dynamic range in the different parts of the spectrum. Pre/de-emphasis, on the other hand, relies on the fact that the spectrum of acoustical signals is in general far from flat while at the same time the threshold of hearing also varies over the audible spectrum. The first invariably rolls off at high frequencies and the second creeps quite high in the acute register. Minimum thresholds are attained in the middle register, around the 1kHz mark. This means that it is advantageous to shift the transmitted dynamic range of separate bands with respect to each other. High frequencies, for instance, can be boosted (since acoustical signals leave some headroom there) and then de-emphasized at playback so that any noise inserted by the signal chain is attenuated. In fact, in this range the perceptual noise floor can often be lowered below the threshold of hearing, in effect making the transmission chain perceptually flawless. The problem is, in the crucial mid and high mid ranges the dynamic range of both music and the hearing are wide and we also observe the minimum threshold of hearing. This is why pre/de-emphasis is not a panacea.

Really what we are doing here is reminiscent of noise reduction: we boost the signal in the production side and then attenuate at playback so that any inserted noise comes down similarly. The main differences are that pre/de-emphasis is static (it relies on a certain general spectral shape of the input) and the fact that noise reduction applications aim at bringing the noise below the masking threshold, not the threshold of hearing, and are so much less tolerant to subsequent processing of the decoded signal. Pre/de-emphasis is also a linear operation, so at least in theory it can be inverted perfectly. Of course, this is only possible when the dynamic range of the system was sufficient to transmit the signal in the first place (i.e. there is no clipping).

Some careful thought enables us to construct a system with both emphasis and noise shaping to yield superior perceptual quality from a given bit depth. At best, 4–6 bits worth of perceptual dynamic range can be added by the procedure. Given that the sample rate is high enough so that all frequencies of interest are transmitted (some suggest 26kHz audio band gives all the headroom we need), optimal use of emphasis and noise shaping can theoretically yield perceptually transparent transmission from a 14 bit system. Since we do not rely on masking, the design yields surprisingly good error resilience in the presence of subsequent signal processing. Proper dithering can then be used to make the signal chain linear for all practical purposes. These results are in stark contrast with the claims of audiophiles and the audio industry of the need for ever higher bit rates in audio transmission.

This section draws heavily on the material available at the Acoustic Renaissance for Audio (ARA) web site. The material presented there is also much more comprehensive and doesn’t utilize proof-by-assertion like this introduction. The ARA activity is one of the greater influences to prompt me to take up writing this text.

From the above it appears that with some minor adjustment, the 16 bits and 44.1kHz of CD are almost enough. In practice we do have to be slightly more cautious. It must be acknowledged that some anecdotal evidence exists in favor of greatly increased bit rates. The most important experiments involve the effect of ultrasonics in the presence of frequencies traditionally considered audio band and the effects of ultrasonics on sound localisation and the definition (whatever that means) of sounds. The argument goes, we might not consciously hear isolated ultrasonics but in the presence of other sound material (especially transients) they might serve as additional localisation cues. There is also the age old debate over ultrasonics permeating to the audible band through distortion products generated in the ear. The latter claim has gained some support from experiments involving timbre perception of periodic sounds with and without ultrasonic components. All in all, the effects seem to be minor and it is not entirely clear whether they really exist, at least to a degree which requires attention from the audio designer.

One further, rather persuasive reason to reconsider the need for higher bit rates is the one of sensible resource allocation. If stereo transmission was really the best that could be done, ever higher accuracy could be justified by arguments of the better-be-on-the-safe-side type. But there are multitudes of unaddressed issues in digital audio transmission which have nothing to do with the numerical accuracy of the channels employed. The most important ones are the number of channels, the accurate definition of what exactly is encoded by the information (to date the Ambisonic framework is the only one to comprehensively address this concern) and application of signal processing to enhance the signal chain (e.g. room equalisation, speaker linearization and restoration of analog recordings). The rapid evolution of DSP has also brought out new possibilities, like simulation of acoustical environments, which seem far more interesting from the consumer standpoint than laboratory grade signal chains. We should consider whether the future development of and investments in digital audio systems should perhaps be along these (in my mind extremely interesting) lines instead of on making marginal improvements to channel accuracy.

All of the above holds primarily for audio distribution formats. But when subsequent processing is to be expected, wider samples are very useful in preventing error accumulation. It is well known that most DSP operations, including simple filtering, generate lots of extra bits to existing signals. To guarantee that rounding and dithering products do not accumulate, even 32-bit formats are sometimes used. On the other hand, no such distortion appears in the frequency domain, even in the presence of considerably long signal chains, so this does not affect the sampling rate considerations.
Bitstream as a transmission format

Now we’ve got to the original reason for this article. So far bitstream methods have only been described from the point of view of analog/digital/analog conversion. But a slight change in our point of view, and some study of the complete signal chain from analog to digital and back again leads us to wonder why we are doing the bitstream to linear PCM conversion and its inverse at all. Couldn’t we leave those stages out and simply pass the bitstream resulting from delta-sigma modulation to the playback side? This is quite a natural thought and one that has recently found a concrete application in Sony’s SACD architecture. The subject of this final chapter is bitstream as a channel encoding and an architectural basis of digital audio transmission.
Rationale

The common wisdom in data transmission is that the shorter the signal chain, the more transparent it will be. This is also the most compelling reason for trying to lose the digital filtering and decimation steps from the PCM signal chain. Bitstream advocates feel that since we can do without these steps, they should go. Any processing being a matter of cost, implementations should become cheaper when the digital part is simplified. The idea of passing a simple bitstream is also seen as having a certain elegance. And certainly it has the delightful buzz of any new technology.

On a more serious note, the technique relies on oversampling and noise shaping on a very basic level. The oversampling ratios must be very large in order to get quality playback, so the underlying bandwidth will be huge compared to current PCM systems. Now although the reasoning that lead us to consider delta-sigma conversion and the attendant noise shaping techniques is largely based on putting the shaped quantization noise into the headroom provided by oversampling and then killing it with an analog filter, after removing the digital filters we can also view the system as a full band one with lots of quantization noise, an anti-alias filter with ridiculously bad passband response and funky single-ended de-emphasis to reduce the terrible noise figures. Essentially we have almost a conventional digital transmission line in which only the lowest 1/64 or so of the total bandwidth shows hifi performance. Superficially naïve, this is a powerful observation and has some deep consequences.

We have already seen that a full bandwidth PCM format such as CD can be made a lot better by introducing in-band noise shaping and pre/deemphasis. Now how about applying this reasoning to the above? We get a system in which the lowest 1/64 frequency range (the conventional audio band) is hifi and the rest (possibly up to 32 times the sample rate!) displays a progressively degraded noise figure and maximum output. Now, if we assume that ultrasonic frequencies indeed do contribute to localisation and what not, those frequencies can now, for the most part, be transmitted. The only problem is accuracy, but if we cannot consciously hear the stuff anyway, its presence is a lot more important than the accuracy of transmission. Plus, in the vicinity of the audio band S/N ratios actually stay quite respectable. In effect, the signal chain has a sort of fade-to-noise frequency response.

As the above reasoning suggests, a 64 times oversampling delta-sigma architecture (which is pretty much a standard for 16-18 bit delta-sigma converters in PCM applications) already contains some slack compared to the PCM counterpart. This is to be expected since the data stream is a lot fatter (16 bits times the sample rate vs. 1 bit times 64 times the sample rate already shows a four to one expansion). This implies that there is a certain level of flexibility in the system: varying the roll-off of the analog output filter balances the maximum in-band decoding accuracy vs. the level of access to slightly off-band material possibly encoded. At the same time possible future improvements to delta-sigma modulators give the producer side some choice over greater in-band accuracy (and possibly even enables the encoder to match the dynamic range to the threshold of hearing) vs. encoding off-band material. In effect, the boundary between in-band and off-band material is diminished and the limit can be set in a relatively independent fashion in both ends of the signal chain. All this lends some credibility to bitstream methods as basis for a complete audio architecture.
Sony’s DSD and SACD

The format that has prompted the whole recent bitstream discussion is Sony’s DSD (Direct Stream Digital) which was originally intended for the stereo digital soundtrack of DVD Video. That effort failed so Sony incorporated DSD into new a standalone audio format dubbed SACD (Super Audio CD) which has already hit the streets in Sony’s home market.

DSD is a straight forward application of a multichannel 64 times oversampling delta-sigma conversion at some two and a half megahertz followed by direct transmission/storage of the resulting bitstreams and low order analog lowpass filtering for reconstruction at the reproduction side. SACD places this bitstream on a DVD derived high capacity disc. The most important technological contribution of SACD is the introduction of optional double-layered and hybrid discs. Like other members of the DVD family, double layers simply mean double the capacity. This could be used for extra playing time or, at a later date, to accommodate multichannel capacity (currently only stereo SACDs are defined). The hybrid disc is a more interesting concept.

Hybrid SACDs incorporate one high density layer which stores the DSD bitstream and in addition to this, a Red Book compatible CD layer. The promise goes, hybrid discs will play as CDs in normal CD players in addition to containing the higher fidelity DSD stream. The SACD standard also defines that every SACD player must support the CD format. This is easily achieved because the technology is a direct DVD derivative—DVD players commonly employ dual beam pickups and two layer discs and so have the necessary dual focus capability. The only real obstacle in the way to complete Red Book compatibility is ensuring that the resulting hybrid discs fulfill the Red Book requirements for disc refractive index, thickness, depth of the recording layer and the absorption coefficient of the disc material at the 780nm wavelength used to read a CD. Through some design, Sony has accomplished this goal and created the migration path essential to any new audio format. From the user’s point of view, SACDs currently behave just like conventional CDs. In the future multichannel playback is envisioned and the standard can accommodate up to 6 channels of DSD encoded audio data.

In addition to the above specification, Sony has tried to make the SACD platform more desirable to content providers by embedding both a visible and an invisible watermark into the disc without which the SACD player will refuse to play the disc. This is done to make piracy more difficult. As a further hindrance to copying, no digital outputs are provided in the first generation SACD players.

Sony has tried to position SACD as an audiophile format and holds that SACD is not a direct competitor to DVD-A which it claims is more geared towards the ordinary consumer (read: is somehow in the low end). This is very much reflected in Sony marketing rhetoric surrounding SACD, which invokes the audiophile fondness for analog formats and capitalizes on the benefits of a simplified signal chain.
Foreseeable problems

And now on to the meat. I do not agree at all with Sony hype about SACD being what practically amounts to the Second Coming. I also believe I share this worry with the right people—I’m certainly not the first one to think SACD is not a healthy way to go.

Perhaps the most straight forward reason why SACD is a bad idea is that it is perhaps not needed at all. Blind listening tests tell the average consumer has a fair bit of difficulty telling 24 bits at 96kHz from properly implemented 16 bits at 44.1kHz. Considering the numerical differences between these formats the question of whether we really need accuracy beyond the level of CDs becomes quite acute. Quite some people with golden ears agree that the difference is subtle. Now, the effective bit depth of DSD is around 20 and 24/96 already has over an octave of ultrasonic bandwidth. Why is it that by and far, the same golden ears find a great difference between CDs and SACDs?

Since SACD is very clearly a distribution format, we can question how close CDs mastered for delivery only (i.e. utilizing aggressive noise shaping, perhaps even driven by a masking model) can come. And theory suggests they come really close. So it might be DSD isn’t the optimal approach to improving the signal chain after all—perhaps we should instead stretch CD a bit. As for ultrasonics (which are possibly the only thing CDs cannot address at the fixed 44.1kHz sample rate), the evidence is not conclusive. Perhaps some content above the CD limit of 22050Hz should be included, but the limits set by DSD seem excessive.

Some proponents of DSD also claim that DSD offers superior time accuracy because of the bitstream approach and the extremely high sampling rate. The argument goes, we get in between the samples of PCM because the bitstream changes more often. But it is well known the reconstruction step in PCM achieves similar between the samples resolution, although simple minded analysis doesn’t show that right off. Dither also makes the phase resolution of PCM essentially unlimited when we integrate over all of time.

In addition to purely acoustical arguments, there is a host of technology based reasons to reconsider utilizing SACD. The most serious are related to the fact that bitstreams diverge radically from the more traditional PCM representation. Essentially, a bitstream has no number-like structure, no clearly delimited frames such as the ones defined by PCM samples and the information (contrary to some Sony claims) is not even concentrated in the width of clearly defined pulses (that is to say like in traditional, analog pulse width modulation) but is distributed in a very complicated manner over long bursts of successive bits by the nonlinear noise shaping procedure. Essentially, this puts all current audio processing algorithms in the trash can—these methods require discrete signals which approximate sequences of real numbers. DSD streams do nothing of the sort since every bit in the sequence is inherently bilevel. You cannot even sum DSD streams without running into serious trouble, not to mention the complications with multiplication. And when multiplication and summation go, so does all of today’s signal processing theory.

Now, should we want to compress, convert, mix or edit the stream we have only two possibilities. The first one is, we convert to PCM. The second is, we build new DSP theory to do the operations in the bitstream domain. The first one immediately goes out the window since the first premise of DSD is that the signal chain should not include any of them harmful filters. We also run into complications with delta-sigma itself—it is difficult to guarantee such conversions will be linear. This is less of a problem when we only do the step once or twice and the conversions are viewed as being approximative. But when DSD/PCM/DSD conversions need to be performed multiple times, we run into problems. The format isn’t even specified strictly enough to allow for optimal converters to be built—after all, the best converters marry the digital filters to the ones used in the delta-sigma modulator. In DSD the room left for scalability means the specifications aren’t exact and the architecture inherently separates the modulator and conversion filters from each other.

The second option (new theory) is not very attractive either because it implies creating a theory of nonlinear audio processing from scratch. The complicated time structure of bitstreams—or better yet lack thereoff—complicates any attempts at direct processing even further. Pulling all this together, DSD is not compatible with anything involving calculation. This means it is not suitable for editing and, subsequently, mixing or post-production of any kind.

Considering delta-sigma modulation from the quantized differences of a running sum viewpoint, we see the inaccuracy in the delta values and the variable architecture of the loop low pass filter imply that the actual output of the modulator is highly approximative. Intuitively it would seem that any exact mathematical analysis of the output or complete processing framework for the resultant bitstreams must rely on the precise time behavior of the loop filter. We might think that since the audio band is present in the bitstream in its intact form, we might somehow neglect the precise behavior of the bitstream and only work in the significant frequencies. But this will not work either because additive processing and filtering cannot be accomplished directly.

Like any distribution format, DSD faces the usual questions of error resilience, space efficiency and so on. DSD does not fare very well in this department. Error correction codes can be used like usual but error concealment