The approximate discrete Radon transform on graphics processing units: A case study in auto-tuning of OpenCL implementations.



The Open Computing Language (OpenCL) is designed to provide a platformindependent specification for programming heterogenous computing systems. The performance of an OpenCL program, however, is not easily transferrable from one platform to another. Autotuning is among the techniques that address this situation by automating the performance optimization of OpenCL programs via systematically applying program transformations. We introduce a novel autotuning framework to generate OpenCL programs and report on a case study computing an approximate discrete Radon transform. Experiments on four different graphics processing units indicate that, for a wide range of problem sizes and input parameters, the execution times of the autotuned OpenCL programs are smaller than those of three handtuned CUDA implementations.