The non-symmetric s-step Lanczos algorithm: Derivation of efficient recurrences and synchronization-reducing variants of BiCG and QMR.




The Lanczos algorithm is among the most frequently used iterative techniques for computing a few dominant eigenvalues of a large sparse nonsymmetric matrix. At the same time, it serves as a building block within the Biconjugate Gradient (BiCG) and the QuasiMinimal Residual (QMR) methods for solving large sparse nonsymmetric systems of linear equations. It is well known that, when implemented on distributedmemory computers with a huge number of processes, the synchronization time spent in computing dot products increasingly limits the parallel scalability. Therefore, we propose synchronizationreducing variants of the Lanczos, as well as BiCG and QMR methods, in an attempt to mitigate these negative performance effects. These socalled sstep algorithms are based on grouping dot products for joint execution and replacing timeconsuming matrix operations by efficient vector recurrences. The purpose of this paper is to provide a rigorous derivation of the recurrences for the sstep Lanczos algorithm, introduce sstep BiCG and QMR variants, and compare the parallel performance of these new sstep variants with previous algorithms.