Rendering the Stereo View with a Trifocal 3D Rig

Technicolor and Fraunhofer Talk Details of Three-Camera 3D Process at SMPTE Confab

What's the next frontier in stereo-3D shooting? Freedom and flexibility. Research is going into stereo acquisition techniques that may, with careful attention to their set-up and configuration, allow production crews to work as quickly and seamlessly as they do in 2D. One way to make that happen may be by using a relatively simple three-camera rig that will gather enough information on the day of the shoot to let the stereo image be created in post.

Stereo image capture is tough partly because of the need to keep two cameras and lenses in perfect alignment under stereographically correct shooting conditions (interaxial distance, convergence, etc.). 2D-to-3D conversions would be a great shortcut if they weren’t subject to the good-fast-cheap rule. (You can only pick two, and studios have generally insisted on fast and cheap to date.) If only there were a way to capture foolproof stereo imagery that allowed the various parameters controlling depth in the scene to be selected and adjusted in post, rather than painstakingly maintained during the shoot.

One of the decisions that gets baked into your footage during production is interaxial distance – the spacing between your two camera lenses, the parameter that probably has the most profound impact on the appearance of depth in your scene. At last year’s SMPTE International Conference on stereo 3D, Sony suggested that the interaxial distance could actually be reconfigured in future post-production tools based on the depth cues present in a good two-camera stereo-3D image. That information would be used to generate a new, digitally rendered view on the scene.

At this year’s SMPTE 3D conference, researchers from Technicolor and Fraunhofer went one better, describing a “trifocal” system for “depth-based” image acquisition. The camera system would consist of a single, high-quality main camera flanked by two smaller cameras. The main camera view becomes one of the eyes in the 3D image. Disparity-estimation techniques based on the three captured images should allow a second-eye view to be rendered at a virtual interaxial distance that is defined in post.

By using three cameras, rather than just two, the estimated depth information can be cross-checked among different pairs of views to increase reliability of the results. The presenters suggested a real-time preview could be generated on set at about six frames per second. (High-quality depth maps would be generated in post.)

To see a prototype trifocal 3D rig, check out this video from The Stereo 3D Channel, shot at NAB 2010.

OK – assuming you accept the notion that one half of your stereo pair can be a computer-generated image, how would you handle all that data on set? Technicolor’s Thomas Brune detailed a 10-gigabit Ethernet infrastructure that would support all three cameras. The smaller cameras could be based on industrial-style cameras that already use the GigE Vision interface standard, while the main camera would be connected via 10 GigE to an Ethernet switch that would send the signals out for monitoring, device control, recording, and on-set processing and depth preview. The AV streaming in such a system would be hardware-accelerated to guarantee real-time performance.

A prototype FlashPakII field recorder from Technicolor Research & Innovation would boast sustained read and write speeds of 1.1 GB/sec. Yes, that’s pretty speedy – the device would bypass SSD entirely, using instead a matrix of Flash memory devices with parallel busses enabling consecutive write access for maximized performance. Up to seven resolution- and format-independent streams of mixed formats could be recorded to the device in parallel: DPX from the main camera, two GigE Vision streams from the satellite cameras, a stream of metadata, etc. A single FlashPakII would have up to 384 GB of capacity.

The system is designed to be futureproof, with plenty of overhead to support different resolutions and frame rates – Brune noted that network standards have been specified for future 40Gb/sec and 100 Gb/sec Ethernet applications.