Sound Cards for Broadcast Use

Computer audio sound cards are the norm at nearly all radio stations. I often wonder, am I using the best audio quality sound card?  There are some trade-offs on the quality vs. cost curve.  At the expensive end of the curve, one can spend a lot of money on an excellent sound card.  The question is, is it worth it?  The laws of diminishing returns state: No.  High-quality reproduction audio can be obtained for a reasonable price.  The one possible exception to that rule would be production studios, especially where music mix-downs occur.

I would establish the basic requirement for a professional sound card is balanced audio in and out, either analog, digital, or preferably, both.  Almost all sound cards work on PCI bus architecture, some are available with PCMCIA (laptop) or USB.  For permanent installations, an internal PCI bus card is preferred.

Keeping an apples: apples comparison, this comparison it limited to PCI bus, stereo input/output, and analog and digital balanced audio units for general use.  Manufacturers of these cards often have other units with a higher number of input/output combinations if that is desired.   There are several cards to choose from:

The first and preferred general all-around sound card that I use is the Digigram VX222HR series.   This is a mid-price range PCI card, running about $525.00 per copy.

Digigram VX222HR professional sound card
Digigram VX222HR professional sound card

These are the cards preferred by BE Audiovault, ENCO, and others. I have found them to be easy to install with copious documentation and driver downloads available online.  The VX series cards are available in 2, 4, 8, or 12 input/output configurations.  The HR suffix stands for “High Resolution,” which indicates a 192 KHz sample rate.  This card is capable of generating baseband composite audio, including RDS and subcarriers, with a program like Breakaway Broadcast.

Quick Specs:

  • 2/2 balanced analog and digital AES/EBU I/Os
  • A comprehensive set of drivers: driver for the Digigram SDK, as well as low-latency WDM DirectSound, ASIO, and Wave drivers
  • 32-bit/66 MHz PCI Master mode, PCI and PCI-X compatible interface
  • 24-bit/192 kHz converters
  • LTC input and inter-board Sync
  • Windows 2003 server, 2008 server, Seven, Eight, Vista, XP (32 and 64 bit), ALSA (Linux)
  • Hardware SRC on AES input and separate AES sync input (available on special request)

Next is the Lynx L22-PCI.  This card comes with a rudimentary 16-channel mixer program.  I have found them to be durable and slightly more flexible than the Digigram cards.  They run about $670.00 each.  Again, capable of a 192 KHz sample rate on the analog input/outputs.  Like Digigram, Lynx has several other sound cards with multiple inputs/outputs which are appropriate for broadcast applications.

Lynx L22-PCI professional sound card
Lynx L22-PCI professional sound card

Specifications:

  • 200kHz sample rate / 100kHz analog bandwidth (Supported with all drivers)
  • Two 24-bit balanced analog inputs and outputs
  • +4dBu or -10dBV line levels selectable per channel pair
  • 24-bit AES3 or S/PDIF I/O with full status and subcode support
  • Sample rate conversion on digital input
  • Non-audio digital I/O support for Dolby Digital® and HDCD
  • 32-channel / 32-bit digital mixer with 16 sub outputs
  • Multiple dither algorithms per channel
  • Word, 256 Word, 13.5MHz or 27MHz clock sync
  • The extremely low-jitter tunable sample clock generator
  • Dedicated clock frequency diagnostic hardware
  • Multiple-board audio data routing and sync
  • Two LStream™ ports support 8 additional I/O channels each
  • Compatible with LStream modules for ADAT and AES/EBU standards
  • Zero-wait state, 16-channel, scatter-gather DMA engine
  • Windows 2000/XP/XPx64/Seven/Eight/Vista/Vistax64: MME, ASIO 2.0, WDM, DirectSound, Direct Kernel Streaming and GSIF
  • Macintosh OSX: CoreAudio (10.4)
  • Linux, FreeBSD: OSS
  • RoHS Compliant
  • Optional LStream Expansion Module LS-ADAT: provides sixteen-channel 24-bit ADAT optical I/O (Internal)
  • Optional LStream Expansion Module LS-AES: provides eight-channel 24-bit/96kHz AES/EBU or S/PDIF digital I/O (Internal)

Audio Science makes several different sound cards, which are used in BSI and others in automation systems.  These cards run about $675 each.

Audio Science ASI 5020 professional sound card
Audio Science ASI 5020 professional sound card

Specifications:

  • 6 stereo streams of playback into 2 stereo outputs
  • 4 stereo streams of record from 2 stereo inputs
  • PCM format with sample rates to 192kHz
  • Balanced stereo analog I/O with levels to +24dBu
  • 24bit ADC and DAC with 110dB DNR and 0.0015% THD+N
  • SoundGuard™ transient voltage suppression on all I/O
  • Short length PCI format (6.6 inches/168mm)
  • Up to 4 cards in one system
  • Windows 2000, XP and Linux software drivers available.

There are several other cards and card manufactures which do not use balanced audio.  These cards can be used with caution, but it is not recommended in high RF environments like transmitter sites or studios located at transmitter sites.  Appropriate measures for converting audio from balanced to unbalanced must be observed.

Further, there are many ethersound systems coming into the product pipeline which convert audio directly to TCP/IP for routing over an ethernet 802.x based network.  These systems are coming down in price and are being looked at more favorably by broadcast groups.  This is the future of broadcast audio.

Unbalanced to Balanced Audio

There is a large number of things that amazes me on an almost daily basis.  To wit: a local mom-and-pop radio station called me because they couldn’t get their computer program to work right.  I decided that I’d give them an hour or two, in exchange for my hourly labor rate, and see if I could fix their problem.  The issue at hand was a loud hum and other noise on the input source.  I knew before I even looked at it that the likely culprit was a ground loop.

It was worse than I imagined, with several unbalanced and balanced feeds improperly interconnected, line-level audio going to a microphone-level input, and so forth.  I explained to the guy about putting line level into a mic level input, something akin to plugging a 120-volt appliance into a 240-volt outlet.  Improperly terminated balanced audio nullifies all of the common mode noise rejection characteristics of the circuit.

In any case, there are several ways to go from balanced to unbalanced without too much difficulty.  The first way is to wire the shield and Lo together on the unbalanced connector.  This works well with older, transformer input/output gear, so long as the unbalanced cables are kept relatively short.

simple balanced to unbalanced audio connection
simple balanced to unbalanced audio connection

Most modern professional audio equipment has active balanced input/output interfaces, in which case the above circuit will unbalance the audio and decrease the CMRR (Common Mode Rejection Ratio), increasing the chance of noise, buzz, and so on getting into the audio. In this case, the CMRR is about 30 dB at 60 Hz.  Also, newer equipment with active balanced input/output, particularly some brands of sound cards will not like to have the Lo side grounded. In a few instances, this can actually damage the equipment.

Of course, one can go out and buy a Henry Match Box or something similar and be done with it.  I have found, however, the active components in such devices can sometimes fail, creating hum, distortion, buzz, or no audio at all.  Well-designed and manufactured passive components (transformers and resistors) will provide excellent performance with little chance of failure.  There are several methods of using transformers to go from balanced to unbalanced or vice versa.

Balanced to unbalanced audio using 1:1 transformer
Balanced to unbalanced audio using 1:1 transformer

Using a 600:600 ohm transformer is the most common.  Unbalanced audio impedance of consumer-grade electronics can vary anywhere from 270 to 470 ohms or more.  The 10,000-ohm resistor provides constant loading regardless of what the unbalanced impedance.   In this configuration, CMMR (Common-Mode Rejection Ratio) will be 55 dB at 60 Hz, but gradually decreases to about 30 dB for frequencies above 1 KHz.

Balanced to unbalanced audio using a 4:1 transformer
Balanced to unbalanced audio using a 4:1 transformer

A 600:10,000 ohm transformer will give better performance, as the CMMR will be 120 dB at 60 Hz and 80 dB at 3 KHz, remaining high across the entire audio bandwidth.   The line balancing will be far better for the high-impedance load.  This circuit will have about 12dB attenuation, so plan accordingly.

For best results, use high-quality transformers like Jensen, UTC, or even WE 111C (although they are huge) can be used.  I have found several places where these transformers can be “scrounged,” DATS cards on the old 7300 series Scientific Atlanta satellite receivers, old modules from PRE consoles, etc.  A simple audio “balun” can be constructed for little cost or effort and sound a whole lot better than doing it the wrong way.

A brief list, there are other types/manufacturers that will work also:

RatioJensenHammondUTC
1:1 (600:600)JT11E series804, 560GA20, A21, A43
4:1 (10K:600)JT10K series560NA35

Keep all unbalanced cable runs as short as possible.  In stereo circuits, phasing is critically important, so pay attention to how the transformer windings are connected.

Everything we do is destined for one place.

I give you, The Human Ear:

Anatomy of the human ear
Anatomy of the Human Ear, courtesy of Wikipedia

All of the programming elements, all of the engineering equipment and practices, all of the creative process, the music, the talk, the commercials, everything that goes out over the air should reach as many ears as possible.  That is the business of radio.  The quality of the sound and the listening experience is often lost in the process.

Unfortunately, a large segment of the population has been conditioned to accept the relatively low quality of .mp3 and other digital files delivered via computers and smartphones.  There is some hope however; when exposed to good-sounding audio, most people respond favorably, or are in fact, amazed that music can sound that good.

There are few fundamentals as important as sounding good.  Buying the latest Frank Foti creation and hitting preset #10 is all well and good, but what is it that you are really doing?

There was a time when the FCC required a full audio proof every year.  That meant dragging the audio test equipment out and running a full sweep of tones through the entire transmission system, usually late at night.  It was a great pain, however, it was also a good exercise in basic physics.  Understanding pre-emphasis and de-emphasis curves, how an STL system can add distortion and overshoot, how clean (distortion-wise) the output of the console is, how clean the transmitter modulator is, how to correct for base frequency tilt and high-frequency ringing, all of those are basic tenants of broadcast engineering.  Mostly today, those things are taken for granted or ignored.

Audio frequency vs. wavelength chart
Audio frequency vs. wavelength chart

Every ear is different and responds to sound slightly differently.  The frequencies and SPLs given here are averages, some people have hearing that can go far above or below average, however, they are an anomaly.

Understanding audio is a good start.  Audio is also known as sound pressure waves.  A speaker system generates areas or waves of lower and high pressure in the atmosphere.  The size of these waves depends on the frequency of vibration and the energy behind the vibrations.  Like radio, audio travels in a wave outward from its source, decreasing in density as a function of the area covered.  It is a logarithmic decay.

The human ear is optimized for hearing in the mid-range band around 3 KHz, slightly higher for women and lower for men.  This is because the ear canal is a 1/4 wavelength resonant at those frequencies.  Mid range is most associated with the human voice and the perceived loudness of program material.

Bass frequencies contain a lot of energy due to the longer wavelengths.  This energy is often transmitted into structural members without adding too much to the listening experience due to a sharp roll-off starting around 100 Hz.  Too much base energy in radio programming can sap loudness by reducing the midrange and high-frequency energy from the modulated product.

High frequencies offer directivity, as in left right stereo separation.  Too much high frequency sounds shrill and can adversely affect female listeners, as they are more sensitive to high-end audio because of smaller ear canals and tympanic membranes.

Processing programming material is a highly subjective matter.  I am a minimalist, I think that too much processing is self-defeating.  I have listened to a few radio stations that have given me a headache after 10 minutes or so.  Overly processed audio sounds splashy, contrived, and fake with unnatural sounds and separation.  A good idea is to understand each station’s processing goals.  A hip-hop or CHR station obviously is looking for something different than a classical music station.

For the non-engineer, there are three main effects of processing;  equalization, compression (AKA gain reduction), and expansion.  Then there are other things like phase rotation, pre-emphasis or de-emphasis, limiting, clipping, and harmonics.

EQ is a matter of taste, although it can be used to overcome some non-uniformity in STL paths.  Compression is a way to bring up quiet passages and increase sound density or loudness.  Multi-band compression is all the rage, it allows each of the four bands to react differently to program material, which can really make things sound differently than they were recorded. Miss-adjusting a multi-band compressor can make audio really sound bad.  Compression is dictated not only by the amount of gain reduction but also by the ratio, attack, and release times.  Limiting is relative to compression, but acts only on the highest peaks.  A certain amount of limiting is good as it acts to keep programming levels constant.  Clipping is a last resort method for keeping errant peaks from affecting modulation levels.  Expansion is often used on microphones and is a poor substitute for a well built quiet studio.  Expansion often adds swishing effects to microphones.

I may break down the effects of compression and EQ in a separate post.  The effects of odd and even order audio harmonics could easily fill a book.

Audio over IP, what is it, why should I care?

IP networks are the largest standardized data transfer networks worldwide.  These networks can be found in almost every business and home and are used for file transfer, storage, printing, etc.  The Internet Protocol over Ethernet (802.x) networks is widely understood and supported.  It is robust, inexpensive, well-documented, readily deployed, and nearly universal.  Many equipment manufacturers such as Comrex, Telos, and Wheatstone have developed audio equipment that uses IP networks to transfer and route audio within and between facilities.

IP protocol stack
IP protocol stack

Audio enters the system via an analog-to-digital converter (A/D converter), often a sound card, at which point a computer program stores it as a file.  These files can be .wav, .mp3, .mp4, apt-X, or some other format.  Once the audio is converted to a digital data format, it is handled much the same way as any other digital data.

IP stands for “Internet Protocol,” which is a communications protocol for transmitting data between computers connected to area networks.  In conjunction with a transmission protocol, either TCP (Transmission Control Protocol) or UDP (User Datagram Protocol) IP forms what is known as the Internet Protocol Suit known as TCP/IP.  The Internet Protocol Suit contains four layers:

  1. Application layer – This is the protocol that contains the end-use data.  Examples of these would be HTTP, FTP, DCHP, SMTP, POP3, etc.  Telos Systems uses its own application called “Livewire” for its equipment.  Wheatstone uses “WHEATNET.”  Digigram uses “Ethersound.”   This is an important distinction.
  2. Transfer layer – This contains the TCP or UDP header information that contains such things as transmitting, and receiving ports, checksum value for error checking, etc.  It is responsible for establishing a pathway through multiple IP networks, flow control, congestion routing, error checking, and retransmission.  TCP allows for multiple IP packets to be strung together for transmission, increasing transfer rate and efficiency.
  3. Internet layer – This is responsible for transporting data packets across networks using unique addresses (IP addresses).
  4. Link Layer – This can also be called the physical layer, using Ethernet (802.x), DSL, ISDN, and other methods.  The physical layer also means things like network cards, sound cards, wiring, switches, and routers.

Advantages:

An IP network can be established to transmit data over almost any path length and across multiple link layer protocols.  Audio, converted to data can thus be transmitted around the world, reassembled, and listened to with no degradation.  Broadband internet connections using a cable, DSL, ISDN, or T-1 circuits can be pressed into service as STL’s, ICR’s, and TSL’s.  This translates to fast deployment; no STL coordination or licensing issues, no antennas to install if on a wired network.  Cost reductions are also realized when considering this technology over dedicated point-to-point TELCO T-1’s.  Additionally, license-free spread spectrum radios that have either DS-1 or 10baseT Ethernet ports can be used, provided an interference-free path is available.

IP audio within facilities can also be employed with some brands of consoles and soundcards, thus greatly reducing audio wiring and distribution systems and corresponding expenses.  As network speeds increase, file transfer speeds and capacity also increase.

Disadvantages:

Dissimilar protocols in the application layer mean a facility can’t plug a Barix box into a Telos Xtream IP and make it work.  There are likely hundreds of application layer protocols, most of which do not speak to each other.  At some point in the future, an IP audio standard, like the digital audio AES/EBU may appear, which will allow equipment cross-connections.

Additionally, the quality of the physical layer can degrade performance over congested networks.  The installations must be carefully completed to realize the full bandwidth capacities of cables, patch panels, patch cords, etc.  Even something as little as stepping on a Category 6 cable during installation can degrade its high-end performance curve.  The cable should be adequately supported, not kinked, and not stretched (excessive pulling force) during installation.

TCP/IP reliability is another disadvantage over formats like ATM.  In a TCP/IP network, no central monitoring or performance check system is available.  TCP/IP is what could be called a “broadcast” protocol.  That is to say, it is sent out with a best-effort delivery and no delivery confirmation.  Therefore, it is referred to as a connection-less protocol and in network architecture parlance, an unreliable network.  Lack of reliability allows any of these faults to occur; data corruption, lost data packets, duplicate arrival, out of order data packets.  That is not to say that is does not work, merely that there is no alarm generated if an IP network begins to lose data.  Of course, the loss of data will affect the reconstruction of the audio.

Analog digital converter symbol
Analog digital converter symbol

Finally, latency can become an issue over longer paths.  Every A/D converter, Network Interface Card (NIC), cable, patch panel, router, etc has some latency in its circuitry.  These delays are additive and dependent on the length of the path and the number of devices in it.

Provided care is taken during design and installation, AOIP networks can work flawlessly.  Stocking adequate spare parts, things like ethernet switches, NICs, patch cables and a means to test wiring and network components is a requirement for AOIP facilities.