The limited memeory BFGS (L-BFGS) algorithm is a quasi-Newton method for convex optimization.


Definition and Syntax

bool lbfgs(arma::vec& init_out_vals, std::function<double (const arma::vec& vals_inp, arma::vec* grad_out, void* opt_data)> opt_objfn, void* opt_data);
bool lbfgs(arma::vec& init_out_vals, std::function<double (const arma::vec& vals_inp, arma::vec* grad_out, void* opt_data)> opt_objfn, void* opt_data, algo_settings_t& settings);

Function arguments:

  • init_out_vals a column vector of initial values; will be replaced by the solution values.
  • opt_objfn the function to be minimized, taking three arguments:
    • vals_inp a vector of inputs;
    • grad_out a vector to store the gradient; and
    • opt_data additional parameters passed to the function.
  • opt_data additional parameters passed to the function.
  • value_out the value of opt_objfn when evaulated at the output values.
  • settings parameters controlling the optimization routine; see below.

Optimization control parameters:

  • double err_tol the tolerance value controlling how small $\| \nabla f \|$ should be before 'convergence' is declared.
  • int iter_max the maximum number of iterations/updates before the algorithm exits.
  • bool vals_bound whether the search space is bounded. If true, then
    • arma::vec lower_bounds this defines the lower bounds.
    • arma::vec upper_bounds this defines the upper bounds.

  • int lbfgs_par_M storage parameter.

Details

Let $x^{(i)}$ denote the function input values at stage $i$ of the algorithm. The L-BFGS updating rule is as follows.

  • The descent direction $d^{(i)}$ is found via a two-loop recursion. Let \begin{align*} s^{(i)} &:= x^{(i+1)} - x^{(i)} \\ y^{(i)} &:= \nabla f^{(i+1)} - \nabla f^{(i)} \end{align*} where $\nabla f^{(i)}$ is the gradient of $f$ evaluated at $x^{(i)}$.
    Then:
    • Set $$q = \nabla f(x^{(i)})$$
    • for $k = i - 1, \ldots, i - M$ do \begin{align*} \rho_k &= 1 / ( y^{(k)}{'} s^{(k)}) \\ \gamma_k &= \rho_k \times s^{(k)}{'} q \\ q &= q - \gamma_k \times y^{(k)} \end{align*} where the history length $M$ is set via lbfgs_par_M.
    • Set $r = q \times (s^{(i)}{'}y^{(i)} / y^{(i)}{'}y^{(i)})$
    • for $k = i - M, \ldots, i - 1$ do
    • $$\beta = \rho_k y^{(k)}{'} r$$ $$r = r + s^{(k)}(\gamma_k - \beta)$$
    • Finally, set $d^{(i)} = - r$.
  • The input values are then updated: $$x^{(i+1)} = x^{(i)} + \alpha^{(i)} d^{(i)}$$ where $\alpha^{(i)}$ is found via line search: $$\alpha^{(i)} = \arg \min_\alpha f(x^{(i)} + \alpha d^{(i)})$$ The line search method used in OptimLib is that of More and Thuente (1994).

The algorithm stops when $\| \nabla f \|$ is less than err_tol, or the total number of iterations exceeds iter_max.


Examples

As examples, consider minimization of:

  • the Sphere function $$\min_{x \in \mathbb{R}^n} \sum_{i=1}^n x_i^2$$ The solution is the zero vector
  • Booth's function $$\min_{x \in [-10,10]^2} \left\{ (x_1 + 2x_2 - 7)^2 + (2 x_1 + x_2 - 5)^2 \right\}$$ The global minimum is located at $(1,3)$.

Below are plots of each example, restricted to the two-dimensional problem where relevant.

Sphere

Booth


Code solving each example using OptimLib is given below.

#include "optim.hpp"

double sphere_fn(const arma::vec& vals_inp, arma::vec* grad_out, void* opt_data)
{
    double obj_val = arma::dot(vals_inp,vals_inp);
    //
    if (grad_out) {
        *grad_out = 2.0*vals_inp;
    }
    //
    return obj_val;
}

double booth_fn(const arma::vec& vals_inp, arma::vec* grad_out, void* opt_data)
{
    double x_1 = vals_inp(0);
    double x_2 = vals_inp(1);

    double obj_val = std::pow(x_1 + 2*x_2 - 7.0,2) + std::pow(2*x_1 + x_2 - 5.0,2);
    //
    if (grad_out) {
        (*grad_out)(0) = 2*(x_1 + 2*x_2 - 7.0) + 2*(2*x_1 + x_2 - 5.0)*2;
        (*grad_out)(1) = 2*(x_1 + 2*x_2 - 7.0)*2 + 2*(2*x_1 + x_2 - 5.0);
    }
    //
    return obj_val;
}

int main()
{
    //
    // sphere function

    const int test_dim = 5;

    arma::vec x = arma::ones(test_dim,1); // initial values (1,1,...,1)

    bool success = optim::lbfgs(x,sphere_fn,nullptr);

    if (success) {
        std::cout << "lbfgs: sphere test completed successfully." << std::endl;
    } else {
        std::cout << "lbfgs: sphere test completed unsuccessfully." << std::endl;
    }

    arma::cout << "lbfgs: solution to sphere test:\n" << x << arma::endl;

    //
    // Booth function

    arma::vec x_2 = arma::zeros(2,1); // initial values (0,0)

    bool success_2 = optim::lbfgs(x,booth_fn,nullptr);

    if (success_2) {
        std::cout << "lbfgs: Booth test completed successfully." << std::endl;
    } else {
        std::cout << "lbfgs: Booth test completed unsuccessfully." << std::endl;
    }

    arma::cout << "lbfgs: solution to Booth test:\n" << x_2 << arma::endl;

    return 0;
}