temporal-lightfield-reconstruction icon indicating copy to clipboard operation
temporal-lightfield-reconstruction copied to clipboard

Open source Git repository for the SIGGRAPH 2011 paper "Temporal Light Field Reconstruction for Rendering Distribution Effects"


*** An implementation of


*** Lehtinen, J., Aila, T., Chen, J., Laine, S., and Durand, F. 2011, *** Temporal Light Field Reconstruction for Rendering Distribution Effects, *** ACM Transactions on Graphics 30(4) (Proc. ACM SIGGRAPH 2011), article 55.


*** http://research.nvidia.com/publication/temporal-light-field-reconstruction-rendering-distribution-effects *** http://groups.csail.mit.edu/graphics/tlfr *** http://dx.doi.org/10.1145/1964921.1964950


System requirements

  • Microsoft Windows XP, Vista, or 7. Developed and tested only on Windows 7 x64.

  • For filtering large sample sets, several gigabytes of memory and a 64-bit operating system.

  • For GPU reconstruction:

    • NVIDIA CUDA-compatible GPU with compute capability 2.0 and at least 512 megabytes of RAM. GeForce GTX 480 is recommended.

    • NVIDIA CUDA 4.0 or later (see http://developer.nvidia.com/cuda-toolkit-archive)

    • Microsoft Visual Studio 2008. Required even if you do not plan to build the source code, as CUDA compilation, which happens at runtime, requires it.

  • This software runs and compiles only on Windows and Visual Studio 2008. We welcome contributions of ports to other versions of Visual Studio and other OSs.

Instructions

General use

Launching reconstruction_app.exe will start the viewer application, which by default loads a sample buffer from the data/ directory. The default view (F1) shows the input samples with box filtering, which results in a noisy image.

Pressing F3 (or clicking the corresponding button in the GUI) will run our reconstruction algorithm and display the result.

Pressing space will toggle between CPU and GPU reconstruction. This only works provided that you have a CUDA-capable GPU with compute capability 2.0 or over.

Pressing F5 will run the reconstruction algorithm to produce a non-blurry pinhole image at the end of the time interval (t=1). This is for debugging purposes. Shading is in general not identical to the ground truth pinhole image (F4): If, for example, the scene has motion, the pinhole reconstruction is done from samples that include motion blurred shadows, whereas the ground truth pinhole image is rendered from a static setup.

If available, the app loads in a ground truth image from the same directory as the sample buffer. F2 allows you to view it. File name must be exactly as shown in the provided example. The two single digits in the filename specify a gamma the images were rendered with, so the app can apply the same to its own output (this just sets the gamma slider on the right to the specified value). The gamma is only read from the non-pinhole reference, the pinhole reference must match.

Sample Buffer Format

Sample buffers are stored in two files: the main file that contains the sample data itself, and a second header file, whose name must match the main file. For example,

samplebuffer.txt
samplebuffer.txt.header

form a valid pair.

The header specifies all kinds of useful information. It looks like this:

Version 1.3
Width 962
Height 543
Samples per pixel 16
Motion model: perspective
CoC coefficients (coc radius = C0/w+C1): -15.242497,19.053122
Encoding = text
x,y,z/w,w,u,v,t,r,g,b,a,mv_x,mv_y,mv_w,dwdx,dwdy

Width and Height specify the dimensions of the image, in pixels. Samples per pixel says how many samples to expect. Motion model is there for historical reasons; only value currently supported is "perspective". The CoC coefficients give formulas for computing the slope dx/du and dy/dv given the camera space depth (w) for a sample (see below). The last row serves as a reminder on how to interpret the numbers in the actual sample file.

The actual sample data file can be either text or binary. We recommend generating your sample buffers in text format and using the functionality in UVTSampleBuffer to convert it to binary; this can be done by loading in the text version and serializing back to disk with the binary flag turned on.

The "CoC coefficients" C0,C1 are constants that are used for computing the circle of confusion for a given depth, assuming a thin lens model. The CoC corresponds directly to the slopes dx/du and dy/dv for a given depth w. This is how to compute C0 and C1, illustrated using PBRT's perspective camera API:

float f = 1.f / ( 1 + 1.f / pCamera->getFocalDistance() );
float sensorSize = 2 * tan( 0.5f * pCamera->getFoVRadians() );
float cocCoeff1 = pCamera->getLensRadius() * f / ( pCamera->getFocalDistance() - f );
cocCoeff1 *= min( camera->film->xResolution, camera->film->yResolution ) / sensorSize;
float cocCoeff0 = -cocCoeff1 * pCamera->getFocalDistance();

If you use another model, you must derive the constants C0 and C1 yourself.

In the main sample file, each line describes one sample specified by 16 floating point numbers.

x and y are the sample's pixel coordinates, including
fractional subpixel offsets.

z/w is projected z, which is currently unused.

w is the camera-space depth.

u and v are the lens coordinates at which the sample was
taken, in the range [-1,1].

t is the time coordinate in the range [0,1] denoting the
instant the sample was taken.

r,g,b,a is the sample's radiance, in linear RGB. alpha is
currently unused.

mv_x, mv_y and mv_w are the sample's motion vector. They
encode the difference of the camera space (homogeneous)
position of the sample at the end of the shutter interval
(t=1) and the beginning of the shutter interval (t=0). In
other words, it must satisfy

(xy*w)(T=0) = xy(T=t)*w(T=t) - t*V
(xy*w)(T=1) = xy(T=t)*w(T=t) + (1-t)*V

Where xy(T=t) and w(T=t) mean the sample's original screen
position and depth at the time it was taken. Notice that the
screen position is converted to homogeneous coordinates by
multiplication by w, but there is no scale and bias to [-1,1]
clip space coordinates; dividing by w yields pixel coordinates
directly.

NOTE that pbrt's default motion model is not world-affine as
it performs interpolation of rotation matrices. As mentioned
in the paper, we changed this in our version. A pbrt patch
that outputs sample buffers with world-affine motion will be
released separately.

See Sec. 3.1 of the paper for a more thorough explanation.

CUDA

The pre-built 64-bit binary is naturally built with CUDA support, but it is by default disabled in the code to enable building without the CUDA SDK. To enable CUDA support, find the line

#define FW_USE_CUDA 0

in src/framework/base/DLLImports.hpp and change it to

#define FW_USE_CUDA 1

Provided you have a working install of CUDA Toolkit 4.0, this will enable the GPU code path (toggled by Spacebar in the application). The first time it is run, the application will compile and cache the .cu file containing the reconstruction kernels. This may take a few seconds.

If you get an error during initialization, the most probable explanation is that the application is unable to launch nvcc.exe contained in the CUDA Toolkit. In this case, you should:

  • Set CUDA_BIN_PATH to point to the CUDA Toolkit "bin" directory, e.g. "set CUDA_BIN_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\bin".

  • Set CUDA_INC_PATH to point to the CUDA Toolkit "include" directory, e.g. "set CUDA_INC_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include".

  • Run vcvars32.bat to setup Visual Studio paths, e.g. "C:\Program Files\Microsoft Visual Studio 9.0\VC\bin\vcvars32.bat".

Release notes

  • The soft shadow filtering code is included for reference, but not directly callable from the example application. This is because it requires a significant amount of support code for generating the light samples from camera samples, etc. We apologize for the inconvenience.

  • DEVIATION FROM THE PAPER: Hieararchy consumes less memory than reported due to code cleanup.

Known issues

Fast motion with large depth differences:

Objects that undergo fast motion during the shutter interval, such that they move significantly towards or away from the camera, cause the samples' apparent screen trajectories x(t) and y(t) to deviate from straight lines in XYT. Because of this curvature, the BVH nodes' bounds become looser -- this is because we use linear XYUVT hyperplanes we use as the bounds. This in turn may lead to somewhat reduced efficiency, particularly in the CUDA implementation which has strict bounds on the number of tree nodes it can handle during the filtering. Exceeding this bound causes it to revert back to the CPU implementation. We have not observed this behavior in anything but in test cases where the motion in the Z direction is large.

Using bounds that better adapt to the perspective-induced curvature in the trajectories should fix this problem if it becomes a practical issue. One such formulation can be found in our paper "Clipless Dual-Space Bounds for Faster Stochastic Rasterization", ACM TOG 30(4) (Proc. SIGGRAPH 2011).