The Nonlinear Conjugate Gradient (CG) method generalizes its linear namesake to a broader class of convex optimization problems.


Definition and Syntax

bool cg(arma::vec& init_out_vals, std::function<double (const arma::vec& vals_inp, arma::vec* grad_out, void* opt_data)> opt_objfn, void* opt_data);
bool cg(arma::vec& init_out_vals, std::function<double (const arma::vec& vals_inp, arma::vec* grad_out, void* opt_data)> opt_objfn, void* opt_data, algo_settings_t& settings);

Function arguments:

  • init_out_vals a column vector of initial values; will be replaced by the solution values.
  • opt_objfn the function to be minimized, taking three arguments:
    • vals_inp a vector of inputs;
    • grad_out a vector to store the gradient; and
    • opt_data additional parameters passed to the function.
  • opt_data additional parameters passed to the function.
  • settings parameters controlling the optimization routine; see below.

Optimization control parameters:

  • double err_tol the tolerance value controlling how small $\| \nabla f \|$ should be before 'convergence' is declared.
  • int iter_max the maximum number of iterations/updates before the algorithm exits.
  • bool vals_bound whether the search space is bounded. If true, then
    • arma::vec lower_bounds this defines the lower bounds.
    • arma::vec upper_bounds this defines the upper bounds.

  • int cg_method which updating formula to use; see the details section below.
  • double cg_restart_threshold the value $\nu$ in the details section below.

Details

Let $x^{(i)}$ denote the function input values at stage $i$ of the algorithm. The Nonlinear Conjugate Gradient updating rule is as follows.

  • The descent direction is computed using $$d_i = - \nabla f_{i} + \beta_i \times d_{i-1}$$ where $\beta$ is calculated using one of several updating rules below, depending on the value cg_method.
  • The Nonlinear CG method is known to perform better with periodic restarting (setting $\beta = 0$). In OptimLib, this occurs when
  • $$\dfrac{|\nabla f_i \cdot \nabla f_{i-1}|}{\nabla f_{i} \cdot \nabla f_{i}} > \nu$$
  • The values are then updated: $$x^{(i+1)} = x^{(i)} + \alpha^{(i)} d^{(i)}$$ where $\alpha^{(i)}$ is found via line search: $$\alpha^{(i)} = \arg \min_\alpha f(x^{(i)} + \alpha d^{(i)})$$ The line search method used in OptimLib is that of More and Thuente (1994).

The algorithm stops when $\| \nabla f \|$ is less than err_tol, or the total number of iterations exceeds iter_max.


Updating Formula:

  • cg_method = 1 Fletcher–Reeves (FR):
  • $$\beta_{\text{FR}} = \dfrac{\nabla f_i \cdot \nabla f_i}{\nabla f_{i-1} \cdot \nabla f_{i-1}}$$
  • cg_method = 2 Polak-Ribiere (PR) '+':
  • $$\beta_{\text{PR+}} = \max \left\{ 0 , \dfrac{\nabla f_i \cdot (\nabla f_i - \nabla f_{i-1})}{\nabla f_{i-1} \cdot \nabla f_{i-1}} \right\}$$
  • cg_method = 3 FR-PR Hybrid:
  • $$\beta = \begin{cases} - \beta_{\text{FR}} & \text{ if } \beta_{\text{PR}} < - \beta_{\text{FR}} \\ \beta_{\text{PR}} & \text{ if } |\beta_{\text{PR}}| \leq \beta_{\text{FR}} \\ \beta_{\text{FR}} & \text{ if } \beta_{\text{PR}} > \beta_{\text{FR}} \end{cases}$$
  • cg_method = 4 Hestenes-Stiefel:
  • $$\beta_{\text{HS}} = \dfrac{\nabla f_i \cdot (\nabla f_i - \nabla f_{i-1})}{(\nabla f_{i} - \nabla f_{i-1}) \cdot d_i}$$
  • cg_method = 5 Dai-Yuan:
  • $$\beta_{\text{DY}} = \dfrac{\nabla f_i \cdot \nabla f_i}{(\nabla f_{i} - \nabla f_{i-1}) \cdot d_i}$$
  • cg_method = 6 Hager-Zhang:
  • $$\beta_{\text{HZ}} = \left( y - 2 \times \dfrac{y \cdot y}{y \cdot d} \times d \right) \cdot \dfrac{\nabla f_i}{y \cdot d}, \ \ \ y := \nabla f_i - \nabla f_{i-1}$$

Examples

As examples, consider minimization of:

  • the Sphere function $$\min_{x \in \mathbb{R}^n} \sum_{i=1}^n x_i^2$$ The solution is the zero vector
  • Booth's function $$\min_{x \in [-10,10]^2} \left\{ (x_1 + 2x_2 - 7)^2 + (2 x_1 + x_2 - 5)^2 \right\}$$ The global minimum is located at $(1,3)$.

Below are plots of each example, restricted to the two-dimensional problem where relevant.

Sphere

Booth


Code solving each example using OptimLib is given below.

#include "optim.hpp"

double sphere_fn(const arma::vec& vals_inp, arma::vec* grad_out, void* opt_data)
{
    double obj_val = arma::dot(vals_inp,vals_inp);
    //
    if (grad_out) {
        *grad_out = 2.0*vals_inp;
    }
    //
    return obj_val;
}

double booth_fn(const arma::vec& vals_inp, arma::vec* grad_out, void* opt_data)
{
    double x_1 = vals_inp(0);
    double x_2 = vals_inp(1);

    double obj_val = std::pow(x_1 + 2*x_2 - 7.0,2) + std::pow(2*x_1 + x_2 - 5.0,2);
    //
    if (grad_out) {
        (*grad_out)(0) = 2*(x_1 + 2*x_2 - 7.0) + 2*(2*x_1 + x_2 - 5.0)*2;
        (*grad_out)(1) = 2*(x_1 + 2*x_2 - 7.0)*2 + 2*(2*x_1 + x_2 - 5.0);
    }
    //
    return obj_val;
}

int main()
{
    //
    // sphere function

    const int test_dim = 5;

    arma::vec x = arma::ones(test_dim,1); // initial values (1,1,...,1)

    bool success = optim::cg(x,sphere_fn,nullptr);

    if (success) {
        std::cout << "cg: sphere test completed successfully." << std::endl;
    } else {
        std::cout << "cg: sphere test completed unsuccessfully." << std::endl;
    }

    arma::cout << "cg: solution to sphere test:\n" << x << arma::endl;

    //
    // Booth function

    arma::vec x_2 = arma::zeros(2,1) + 2; // initial values (2,2)

    bool success_2 = optim::cg(x,booth_fn,nullptr);

    if (success_2) {
        std::cout << "cg: Booth test completed successfully." << std::endl;
    } else {
        std::cout << "cg: Booth test completed unsuccessfully." << std::endl;
    }

    arma::cout << "cg: solution to Booth test:\n" << x_2 << arma::endl;

    return 0;
}