Downsizing Storage Requirements for Post-Production

Ocarina Networks recently announced that Rainmaker and Zoic Studios had become users of its ECOsystem product for reducing storage requirements through de-duplication of data and compression. Other companies have de-dupe offerings, but Ocarina says the depth of research it’s done into the specific needs of post-production users make its approach especially effective. Studio Bytes asked Carter George, VP of products at Ocarina, to fill us in.
StudioBytes: Let’s talk first about what Ocarina does. Can you give us a crash course in the technology?
Carter George: We’re in the business of shrinking things – in particular, shrinking file sizes. We can shrink things that other people can’t. Media and Entertainment is one of our three top focus areas. A lot of files get created when you’re doing a production, whether it’s a full length 3D animated movie or a 30-second commercial spot, and a lot of them are big files. We can shrink them and save you a bunch of space on storage. One thing you can do is keep files online longer to make your current production go faster. The second thing is, you can shrink the stuff you’re going to store anyway so you can spend less money on buying disks.

Do you compress video? It’s often highly compressed to begin with.
We did an analysis of the actual file types and their content in post. What drives storage growth? It’s not really video. Video comes out at the end, but the stuff that goes into making a video is motion capture files, files from Autodesk products like Maya and Inferno, stuff that feeds render. There’s EXR files and a file type called RLA that shows up a lot. We found probably 20 or 25 file types that were really driving storage growth in the post environment. About half of those we were able to get good [compression] results out of the box and the other half we wrote [custom] algorithms for. But we don’t really shrink video. We shrink uncompressed video, but not the final product.

What makes your system attractive compared to your competitors in the de-duplication market?
Everybody’s tried de-dupe and generic compression. The de-dupe stuff doesn’t really work for media files. Traditional de-dupe works by looking for the same blocks on disk in two different files. And if it finds a block that’s the same in file A and file B, it will get rid of one of them and replace it with a pointer. That’s useful for data that has repetitive chunks on disk. But in the media world, if you take a file and open it and use some tool to change the shading of a couple of pixels and then save that file, that file is typically recompressed by the tool. Every single block on disk may have changed a little bit. So if you want to be able to find the redundant info you have to go about it a different way.

Running a generic compressor on already compressed data gives you maybe 15 percent benefit. That’s not enough to move the needle on your storage costs. The way we work is different. We’re able to find all that redundant info and shrink files that other people can’t. We recognize the output of Maya, and our system knows what to do with Maya files. Mayeb it’s a motion-capture file, or an OpenEXR file for render. We’re not taking the one big hammer we have and trying to hit everything with it and hoping for the best. Everything we do is content-aware.

So you’ve done a lot of research specifically targeting post-production.
And there are other industries where we don’t have solutions. We haven’t gone to the work of developing file types and strategies to deal with each one. But we have real researchers. They’re mathematicians, not app developers, and they’re working on fundamental mathematical algorithms for finding and compressing redundant information in various file types. We have a couple of guys who are image-algorithm guys. You take a video camera and shoot a live scene, and that’s natural. The chrominance and luminance and motion vectors are something that’s happening in the real world. If you use an animation tool, it tends to have different, artificial patterns. We’re starting by asking, for a given file type, is this in the natural family or the artificial family? And we break it down from there.

So you’re not just doing de-duplication. You’re looking for compression strategies that are specific to a file type.
Technically, there’s two categories of things we do. One is object de-dupe, and that’s different from block de-dupe. We decompress things into some raw format, and then we look for duplicate objects in that raw form. The easiest example is if you take a picture of somebody that has red eye and you open it in Photoshop and save the file under a different file name. It’s almost entirely identical info, but on disc everything is different because it’s a JPG. We’ll decompress both of those files to something called the discrete cosine matrix, a table of chrominance and luminance values for each pixel. We’ll be able to shrink the first photo 40 percent, or however much we can do with our photo compressor. And we may be able to save the second photo in just a few bytes [representing only the few pixels that have changed]. And we also have content-aware compressors. Instead of having one compressor like Gzip that you try on everything, we recognize what a file is. We have a neural network that routes either the whole file or parts of a file to different compressors. Each compressor consists of formulas and algorithms and math that’s specific to that file type. So the project combines object dedupe and content-aware processors into one process.

What is a post facility buying from you? Is it just hardware that sits on the storage network?
We have a 2RU server with a green bevel on the front and flashing lights, and that’s our default product. You buy that appliance and point it at your storage, and we read files and write them back where they started from. We have also done partnerships with storage vendors, including some who integrate a special version. With HP, we did it as a software only product, where we became a module that you can put inside your storage. We did another one with BlueArc. They’re mostly known for super-high performance, so you see them a lot in render, not in storage where people keep a lot of stuff for a long time. We did a tight integration where we’re implemented as a co-processor that’s completely invisible to users, and that’s what Rainmaker is using. Several customers, including Zoic, have a version of our appliance that’s tuned to Isilon. And the default is an appliance. Depending on how big and fast they are, they range from $25,000 to $75,000 list. Our biggest studio right now has 22 of those in one cluster.

What’s a typical configuration like, in terms of what gets compressed and when? People aren’t compressing everything, are they?
People will usually pick a policy that compresses and de-dupes everything that hasn’t been modified for X days. You can also set virtual tiers. Usually people won’t use us for the hottest files. But you might compress files as soon as a render is finished. Just dedupe it, don’t compress it. It will get you a certain amount of savings and it’s still fast on readback. After 30 days, go back and add the compression and compress it for reasonable readback time and get another X percentage. Then there’s a logical tier four, where you have the system do everything it knows how to do to shrink the hell out of a file after maybe 90 days. If you just de-dupe, it’s fast enough for everything but the hottest render. But decompression is CPU-intensive. A user probably wouldn’t notice the difference if it takes 30 ms extra to open a file. But if you have a couple hundred blades doing a render and they’re all reading at the CPU’s highest performance, they will notice. It depends on the file type and who’s accessing it. That’s why we give the ability to set policies not just by date range but also by file type. Typically the evaluation period lasts four to six weeks so customers get comfortable with that.

For more information: Ocarina Networks