How Do You Ramp Big-Screen Content Down to Cell-Phone Size?

At M Works Mastering in Boston, Jonathan Wyner posted a long-form concert video for harpist Deborah Henson-Conant, shot last November with the Grand Rapids Symphony. Wyner, co-producer on the project, along with Emmy-winning director Bob Commiskey and Grammy-winning audio mixer Tom Bates, had high expectations the two-hour video, which was shot in 1080i high-definition using nine Panasonic 900 HD cameras with a 48kHz surround soundtrack recorded to a pair of Tascam MX2424 hard disc recorders backed up by RAID arrays. It’s the kind of lush production that was made to sell 60-inch plasma screens and Blu-ray disc players at Best Buy.

But at some point, Wyner and company are going to have to face a new reality: that luscious production is going to end up on a cell phone or iPod screen, reduced to fit one of many new format types, each of which colors the sound and picture in its own way. Of course, when the picture is two inches square and the sound is barely stereo being pumped into a pair of ear buds in glorious MP3, it might reduce the wow factor a notch or two.

If so, Wyner isn’t showing any apprehension. Henson-Conant’s concert video will likely be chopped up for personal video applications. “I can’t see people wanting to spend two hours looking at a tiny screen and listening with earphones,” he says. “But I can see a five-minute clip as being one hell of a marketing tool.”

So whether it’s a clip or a longer piece – Wyner says the era of video glasses is drawing nigh, which will likely encourage longer viewing and increased road fatalities – high-definition projects are preparing to face the music of the really, really small screen.

Crunching The Numbers
Most of what’s seen and heard now on cell phones tends to be simple content. “Music, mainly in the form of ringtones, animation, prank videos, sports – especially crashes – and porn are doing very well on both cell phones and iPods,” says Chris Coyle, vice president for content acquisition and development at Mobile Streams, a London-based company backed by Liberty Media. that's negotiating its first HD content acquisitions from Denver-based Sports HD Productions. “You’ll see more of that when the U.S. cell carriers go to 3G and 4G networks,” he says. Currently, the carriers with the most advanced content networks are Sprint’s Sprint TV and Verizon’s V-CAST, both of which top out at 2.5G.

Video can be compressed using three or four major codecs led by MPEG-4 and H.236. Audio can be subjected to over a dozen, including AAC, MP3, LAME and AMR narrow-band. Compression specialists are learning something digital broadcasters already know: bandwidth determines everything.

“If there’s a lot of action or motion in the picture, that’s where the bandwidth is going to be used, and some of that will come from the audio,” explains Devon Maxwell, and engineer and producer at LoudLouderLoudest, a New York company that specializes in transforming entertainment content into downloads. “The other main consideration is, what is the intended device for playback? We can’t always get to pick the [most appropriate] codecs. We have to go with what will work on the device.”

The playback device – MP3 player, an iPod, etc. – will also impose certain playback limitations. As anyone knows who has ever heard an old song played on a cheap car radio and then on a Bose stereo, there’s lots of information, especially in the low frequencies, that’s in the file but that will never be heard on the intended device. The network’s bit transfer rate also has an impact.

“It’s totally a matter of the network, the device and the distributor,” says Matt Wagner, LoudLouderLoudest’s co-owner and director of operations. “One [provider] may want to send 20 files at one level of resolution and another might want to send 10 at a higher resolution. That’s why knowing the bandwidth capabilities is crucial.”

As far as the data crunchers are concerned, everything they have to process is high-definition, in that they generally receive uncompressed or lossless files from the content originator. “1080i or NTSC – it’s all just varying amounts of data to be compressed,” says Maxwell. Most elements come in as uncompressed AVI files on DVD.

LoudLouderLoudest made its name in ringtones – now a $5 billion business in the U.S. alone. The company is in the process of transitioning to longer-form A/V content, and Maxell says many of the same compression tactics apply. “The biggest one of all is to always, always test the result on the intended device,” he says.

For example, notes Vanessa McDonnell, video content producer at LoudLouderLoudest, MPEG-4 does not handle luminosity or contrast well. “MPEG-4 looks for the transitions from frame to frame and compares them, but high levels of brightness and contrast tend to confuse it,” she explains. “If you limit [the intensity of] the contrast and brightness, you’re cutting down on the amount of processing the codec has to do, and you’ll get a better result.”

The AMR narrow-band codec is designed for speech, which is usually limited to between 1 kHz and 2.5 kHz. Downloads such as news broadcasts benefit from that because it frees up significantly more bandwidth in which to crunch the video. (A similar but seldom-used codec is QCP, developed for Qualcomm phones.)

Treating Music and Audio
Every audio codec will have a very different effect on sonics. Music makes up a significant portion of downloaded content because music videos are perfectly bite-sized for the medium. And, because music is far more complex and utilizes far more bandwidth than dialog or sound effects, the variables in the encoding process are nearly as subejctive as the music itself.

John Hancock engineers streaming music videos for Rollingstone.com. (The actual streamcaster is RealNetworks, which uses the Rolling Stone brand under license.) Hancock has been streaming music videos since the late 1990s, when he was at SonicNet.com, which later became MTVi. Working from his studio in New York’s Chinatown, Hancock says codecs have become more refined and attuned to music’s needs.

“In the old days, you really had to listen as you encoded,” says Hancock. “You would get rid of much of the low-frequency information and tweak the high end and not use stereo. The encoding software was primitive and couldn’t properly process that much information at once. Current encoding algorithms are now good enough that you can let them go on automatic pilot and feel pretty comfortable about the outcome.”

Hancock says that one of the biggest mistakes people make in approaching streaming music content is to try to use broadcasting strategies to compress the audio. Quite the opposite, he stresses. “Greater dynamics will actually let the algorithms work more efficiently than if you compress it and put too much information through the encoder at one time,” he explains. He opts to handle transient peaks by using look-ahead peak limiting in place of broadcast L2-style compression/normalizing.

However, as good as the streaming codecs have gotten, they aren’t sensitive to nuance. “You should be a bit obvious in your mixes and not count on a lot of definition to translate — if a guitarist plays an interesting figure and you want it to cut through [in the encoding process], you may have to boost it more than you might when making a recording for physical media,” Hancock explains. “In general, if you keep the overall mix even and clear, even a bit on the bright-sounding side, you’ll be alright.”

Greg Thompson, who engineers music-video streaming and downloads for AOL’s music division in New York, says he also pays close attention to the dynamic range of the recordings. “It’s not like it’s going to hit a broadcast limiter downstream, as it would if it were a television broadcast,” he says. “I don’t want to make anything too quiet that’s going to get lost in the computer’s noise floor as a result. The good news, though, is that, also unlike TV, the gain structure is not messed with as it is with television. There are very few middlemen in the process of getting the [download] performance to the viewer.”

Hancock and other engineers report that the Sony-developed ATRAC codec has the greatest sensitivity for music encoding, and RealNetworks recently licensed ATRAC encoding for its streamcasts. “I’m happy about that,” Hancock concludes. “In the past I would try to eliminate low frequencies, because computer speakers didn't replicate them, so I would dump them in order to free up CPU power for the encoder to work more efficiently. I’d also keep things as mono as possible, to eliminate ‘flangy’ encoding [a comb-filtering effect], and apply broadcast limiting techniques to my mix. I now try to have the mixes as close to what you would imagine a CD would sound like. The encoding programs are designed to the Red Book CD audio standard, so that’s what you should give them. I don’t count on there being a lot of detail or harmonic content in the final product, so being heavy-handed without overcompressing is a balance that should be sought after.”

When a project completes its final surround and stereo mixes, Jonathan Wyner wants to prepare the mono version himself for the download iteration. “I don’t want to trust an automatic fold-down to do that for me,” he says. He also minimizes extreme low- and high-frequency information – below 80 Hz and above 13 kHz — since many playback devices won’t reproduce it anyway and it saves processing time and power and frees up bandwidth. “Also, for a classical music performance like Deborah’s, I’m doing what I call a 5.0 mix: there’s not enough low-frequency information to warrant putting anything into a separate sub channel. It would be if the playback system had bass management, but portable devices don’t, and I don’t see enough people transferring this type of content to a larger home system. They’ll buy the DVD or the Blu-ray for that.”

It’s too soon to say whether high-definition content will enjoy any advantage once it’s been downsized for downloads. But Mobile Stream’s Chris Coyle says that as the cost of HD production comes down, it opens up the door to many more content sources. “Right now, 90 percent of the content is SD repackaged from suppliers like Sony and Disney,” he says. “But consumers between 14 and 24 – the big download group – are looking for the next JibJab. It’s hard to say if high-def will make that any funnier to them.”