Modeling and Information Processing
Efficient Cubic B-spline Image Interpolation on a GPU
1-Introduction
Accurate interpolation is required in many image processing low level tasks such as image zooming and warping, registration and optical flow estimation. Cubic B-spline (CBS) interpolation is known to provide results with superior accuracy compared to linear interpolation or even cubic convolution interpolation.
When applied to image rotation, the presently most efficient GPU implementation [1] for the cubic spline image interpolation still cost about 10 times as much as linear interpolation. This implementation involves two steps: a prefilter step performs a two-pass forward-backward recursive filter, then a cubic polynomial interpolation step thanks to a cascade of linear interpolations [1]. We propose a simpler and faster implementation for the prefilter -- which is the most time consuming-- in terms of a direct convolution.The overall cost for our cubic B-spline interpolation algorithm then reduces to only twice the cost of linear interpolation.
Our technique is very simple: we exhibit the closed-form of the infinite coefficient series equivalent to the usual recursive form of the prefilter, then we show that truncation to the first 15 coefficients yields as accurate results than the recursive implementation of [1]. This enable the use of a more classical finite coefficient convolution (based on the one pixel per thread paradigm) instead of the recursive implementation of [1] (based on a one line/column per pixel paradigm).
The overall method is described in [2]. CUDA Source code for the described algorithm is available
here.
2- Accuracy of the finite coefficient prefilter
The following figure is a graph of the prefilter coefficients, it shows a fast exponential decay: whereas the theoretical filter has an infinite scale influence, the practical filter only
has influence on nearby pixels.
We propose to replace the recursive filter [1] by a truncated version.
We check the accuracy of this procedure using the following experiment borrowed from Unser
etal. [3] : perform 36 progressive 10 degree rotations on an image
and then compare to the original image. This test is rather selective because each rotation involves an interpolation step and interpolation errors tend to accumulate [1].
This experiment is conducted on the same Lena image as [1]. After 36 rotations, the result is compared to the original Lena image by the means of a difference image.
The following Figure visualizes the residual images resulting from this procedure using different implementations. The recursive CPU implementation of the prefilter by Thévenaz
etal. [4] is denoted CPU IIR, whereas the recursive GPU implementation of the prefilter by Ruijters & Thévenaz [1] is denoted GPU IIR. The proposed method is denoted GPU FIR# where # denote the number of coefficients in the FIR approximation of the IIR filter.
Residual between true image and progressively rotated images. Top left: CPU IIR implementation [4]. Top middle: Ruijters & Thévenaz GPU IIR implementation [1]. Top right: GPU FIR7. Bottom right: GPU FIR9. Bottom middle: GPU FIR11. Down right: GPU FIR15.
3-Assesment of the acceleration gain
Our implementation of cubic B-spline interpolation differs from that of [1] only in the prefilter step, so our assesment is focussed on the prefilter.
The original purpose of this work is GPGPU so the first target is a Tesla C2050 Nvidia card. But B-spline interpolation concerns more generally visualization so we also performed additional tests on the less powerfull GTX 260M Nvidia Card driving our graphic display.
The following Table 1 gives the running times in ms for both target cards and two image sizes. The acceleration factor ranges from x4.4 for the GTX 260M with a 1 Mpix image to x10 for Tesla C2050 with a 4Mpix image. So the GPU FIR implementation tends to be more efficient for larger image sizes and more powerful GPUs.
Another distinctive feature of the GPU FIR wrt GPU IIR is the row/column behaviour. For GPU IIR, the row filter costs 5 times as much as the column filter, see Table 2. Conversely, row and column filters are well balanced for GPU FIR15.
4-Bibliography
[1] Daniel Ruijters and Philippe Thévenaz. "GPU prefilter for accurate cubic B-spline interpolation".
The Computer Journal, 2010.
[2] Frédéric Champagnat and Yves Le Sant. "Efficient Cubic B-spline Image interpolation on a GPU". Submitted to Journal of Graphics, GPU, and Game Tools.
PDF
[3] Michael Unser. Splines: A perfect fit for signal and image processing.
IEEE Signal Processing Magazine, 16(6):22--38, November 1999.
[4] P. Thévenaz, T. Blu, and M. Unser. Interpolation revisited.
IEEE Transactions on Medical Imaging, 19(7):739--758, July 2000.