block Jacobi preconditioner

We have seen that we can use the conjugate gradient method in order to solve a sparse linear system $A \mathbf{x} = \mathbf{b}$ where matrix $A$ is symmetric and positive-definite. However, the number of iterations performed in the conjugate gradient method depends on the condition number of $A$ ; the convergence of the conjugate gradient algorithm can deteriorate significantly when $A$ is ill-conditioned.

preconditioned conjugate gradient

In order to address this lack of robustness, we can resort to preconditioning. The idea of preconditioned iterative methods is to transform the initial problem $A \mathbf{x} = \mathbf{b}$, for example by left-multiplying the equation by an invertible matrix $P$, where $P$ is chosen so that $\text{cond} (P A)$ is closer to $1$ than $\text{cond} (A)$. Matrix $P$ is called the preconditioner, and the idea is to choose $P$ such that $P$ is a “good” approximation of $A^{-1}$ while being “cheaper” to compute.

Here we recall the preconditioned conjugate gradient algorithm:

$\textbf{function}$ precondCG($A, P, \mathbf{b}$, $\epsilon_{\text{TOL}}$)
$\quad\epsilon = \epsilon_{\text{TOL}} | \mathbf{b} |_2$
$\quad\mathbf{x} = 0$
$\quad\mathbf{r} = \mathbf{b}$
$\quad\mathbf{z} = P \mathbf{b}$
$\quad\mathbf{p} = \mathbf{z}$
$\quad\textbf{while} \; |\mathbf{r}|_2 \geq \epsilon \; \textbf{do}$
$\qquad\alpha = \mathbf{r}^T\mathbf{z} / |\mathbf{p}|_A^2$
$\qquad\mathbf{x} = \mathbf{x} + \alpha \mathbf{p}$
$\qquad\mathbf{r} = \mathbf{r} - \alpha A \mathbf{p}$
$\qquad\mathbf{z} = P \mathbf{r}$
$\qquad\beta = \mathbf{r}^T\mathbf{z} / (\alpha |\mathbf{p}|_A^2)$
$\qquad\mathbf{p} = \mathbf{z} + \beta \mathbf{p}$
$\quad\textbf{end while}$
$\quad\textbf{return} \; (\mathbf{x})$
$\textbf{end function}$

the block Jacobi preconditioner

In this exercise and as a follow-up to the parallelization of the conjugate gradient algorithm on $p$ processes, we will choose $P$ as the block Jacobi preconditioner with $p$ blocks. Once again, we suppose that each MPI process holds a subset of $n := N/p$ rows of the matrix and that vectors are distributed accordingly on the $p$ processes.
The block Jacobi preconditioner reads:

$$P = \sum_{i=1}^p R_i^T (R_i A R_i^T)^{-1} R_i$$

where $R_i \in \mathbb{R}^{n \times N}$ is the restriction matrix from the global numbering of unknowns to the local numbering of process $i$, and $R_i A R_i^T \in \mathbb{R}^{n \times n}$ is the diagonal block of $A$ corresponding to the unknowns held by process $i$.

For illustration purposes, let us consider the following example on 3 processes:

$$ \begin{equation} \begin{array}{lr} \textcolor{red}{proc. 0} \quad \begin{cases} \quad \\ \quad \end{cases}\\ \textcolor{blue}{proc. 1} \quad \begin{cases} \quad \\ \quad \end{cases}\\ \textcolor{limegreen}{proc. 2} \quad \begin{cases} \quad \\ \quad \end{cases} \end{array} \begin{pmatrix} \color{red} 2 & \color{red} -1 & \color{red} 0 & \color{red} 0 & \color{red} 0 & \color{red} 0 \\\ \color{red} -1 & \color{red} 3 & \color{red} -1 & \color{red} 0 & \color{red} 0 & \color{red} 0 \\\ \color{blue} 0 & \color{blue} -1 & \color{blue} 4 & \color{blue} -1 & \color{blue} 0 & \color{blue} 0 \\\ \color{blue} 0 & \color{blue} 0 & \color{blue} -1 & \color{blue} 5 & \color{blue} -1 & \color{blue} 0 \\\ \color{limegreen} 0 & \color{limegreen} 0 & \color{limegreen} 0 & \color{limegreen} -1 & \color{limegreen} 6 & \color{limegreen} -1 \\\ \color{limegreen} 0 & \color{limegreen} 0 & \color{limegreen} 0 & \color{limegreen} 0 & \color{limegreen} -1 & \color{limegreen} 7 \\ \end{pmatrix} = A \end{equation} $$

The block Jacobi preconditioner $P$ is then defined as:

$$ \begin{equation} \begin{array}{lr} \textcolor{red}{proc. 0} \quad \begin{cases} \quad \\ \quad \end{cases}\\ \textcolor{blue}{proc. 1} \quad \begin{cases} \quad \\ \quad \end{cases}\\ \textcolor{limegreen}{proc. 2} \quad \begin{cases} \quad \\ \quad \end{cases} \end{array} \begin{pmatrix} \begin{pmatrix} \color{red} 2 & \color{red} -1 \\\ \color{red} -1 & \color{red} 3 \end{pmatrix}^{-1} \phantom{ \begin{pmatrix} 0 & 0 \\\ -1 & 0 \end{pmatrix} \\\ } \phantom{ \begin{pmatrix} 0 & 0 \\\ -1 & 0 \end{pmatrix} \\\ }\\\ \phantom{ \begin{pmatrix} 0 & 0 \\\ -1 & 0 \end{pmatrix} \\\ } \begin{pmatrix} \color{blue} 4 & \color{blue} -1 \\\ \color{blue} -1 & \color{blue} 5 \end{pmatrix}^{-1} \phantom{ \begin{pmatrix} 0 & 0 \\\ -1 & 0 \end{pmatrix} \\\ }\\\ \phantom{ \begin{pmatrix} 0 & 0 \\\ -1 & 0 \end{pmatrix} \\\ } \phantom{ \begin{pmatrix} 0 & 0 \\\ -1 & 0 \end{pmatrix} \\\ } \begin{pmatrix} \color{limegreen} 6 & \color{limegreen} -1 \\\ \color{limegreen} -1 & \color{limegreen} 7 \end{pmatrix}^{-1} \end{pmatrix} = P \end{equation} $$

As we can see, the block Jacobi preconditioner is very natural considering our previous parallel implementation of the matrix-vector product and the conjugate gradient algorithm, as each diagonal block of $A$ is owned by the MPI process holding the corresponding set of rows of $A$. Thus, each process can independently compute a Cholesky factorization of its diagonal block in parallel. Then, in order to apply the preconditioner $P$ in the parallel preconditioned conjugate gradient algorithm, each process only needs to perform forward and backward substitution using the computed factorization of its local block on the local input vector corresponding to its $n$ row indices.

the Eigen library

In order to compute the Cholesky factorization of the diagonal blocks, we can use the C++ library Eigen. Eigen is a header-only library, you can download the source code here.

Here is a simple Eigen example which solves a sparse linear system using the Cholesky factorization:

click to fold/expand
#include <Eigen/Sparse>
#include <vector>
#include <iostream>
  
int main()
{
  std::vector<Eigen::Triplet<double>> coefficients;

  coefficients.push_back(Eigen::Triplet<double>(0,0,2));
  coefficients.push_back(Eigen::Triplet<double>(0,1,6));
  coefficients.push_back(Eigen::Triplet<double>(1,0,6));
  coefficients.push_back(Eigen::Triplet<double>(1,1,1));
  coefficients.push_back(Eigen::Triplet<double>(2,2,5));

  Eigen::SparseMatrix<double> A(3,3);
  A.setFromTriplets(coefficients.begin(), coefficients.end());

  // perform the Cholesky factorization of A:
  Eigen::SimplicialCholesky<Eigen::SparseMatrix<double>> chol(A);

  Eigen::VectorXd b(3);
  b << 1, 2, 3;

  // use the factorization to solve for the given right hand side:
  Eigen::VectorXd x = chol.solve(b);

  std::cout << x << std::endl;

  return 0;
}

You can find more details about the definition and handling of a sparse matrix in Eigen here.

Previous