Abstract:
Conjugate gradient method is one of the improved gradient algorithms for optimization.
It’s used to resolve linear system, which need to meet some strict requirements.
Though it’s not a general solution for optimization problem, but it’s quite efficient and memory saving for problems it fit.
Content:
Theory:
The conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix $A$ is symmetric and positive-definite. For symmetric, it means $A$ need to satisfy Equation \eqref{symmetric}.
\begin{equation}
\label{symmetric}
A=A^T \\
\end{equation}
For positive-definite, it means for any real value vector $x$, we can have Equation \eqref{positive-def}.
We can see there are many preconditions for conjugate gradient method, but there are advantages. It require less memory than computing reverse $A^{-1}$ by Jacobi equation. And it can guarantee that only n iterations(n is the row or column of $A$, here we ignore computing error) are needed to converge the solution, which is far more faster and reliable than Newton gradient method.
The problem, which fit conjugate gradient method, could be expressed as Equation \eqref{cg-problem}.
where $A$ is a symmetric and positive-definite matrix, $x$ is the vector to be calculated, $b$ is the constant vector. Following is the computational steps for getting x. Since it’s systems of linear equation, initial $x_0$ could be any random value you like.
Firstly, initilaze variables as Equation \eqref{algo-init1}\eqref{algo-init2}\eqref{algo-init3}.
repeat Equation \eqref{algo-update1}\eqref{algo-update2}\eqref{algo-update3} to update $x$ and $r$.
if $r_{k+1}$ is sufficiently small, say $r_{k+1}^2<e$, when $e$ is a given small number. then exist loop, else udpate $p$ by Equation \eqref{algo-update4}\eqref{algo-update5}, and increase $k$ as Equation \eqref{algo-update6}.
go back to loop Equation \eqref{algo-update1}\eqref{algo-update2}\eqref{algo-update3}.
The final result is $x_{k+1}$. Above theory is derived from wikipedia.
Numerical implemetation in Fortran:
Fortran subroutine for conjugate gradient:
1 | module conjugate_gradient_method |
Conjugate gradient application:
1 | program conjugate_gradient_example |
Questions:
Is it adapt to deeplearning issue?
No. As we address above, it’s adapted to linear system with extra requirements(symmetric and positive-definite) to coefficients matrix $A$.
Since deeplearning is non-linear structure, muchless the extra requrements, so the original conjugate gradient method is not suitable for deeplearning.
But there are some improvement that try to appply conjugate gradient to non-linear problem, but it may need more adjustments and tests to check if it could help on deeplearning training.
Is the error scale for measuring if training converge sensitive?
Yes. the error variable $e$ in the code, which decide if loop converge, is quite sensitive. My own test case shows change $e=1.0\times10^{-3}$ to $e=1.0\times10^{-6}$ could improve very much for the solution precision.
History:
- 2018-04-10: create post for demonstration of conjugate gradient method
- 2018-04-26: reformat this page by latex style equation and ref.