The optimization landscape

(another person's attempt, another attempt)

There are two kinds of optimization problems, unconstrained and constrained. Both of these two types of problems have their own conditions for optimality, the conditions for optimality may be first order or second order and they may be necessary or sufficient. These get further subdivided into differentiable or nondifferentiable objective. Further subdivision is possible on the basis of whether the equality/inequality constraints are linear or affine or quadratic or conic etc. 

Differentiable (convex or otherwise)

  • Unconstrained
    • First order - gradient is zero
    • Second order - hessian is positive definite
  • Constrained
    • First Order - linear independent active constraint (qualification) + KKT ( lagrangian's grad = 0, feasibility, complementarity)
    • Second Order - 

Nondifferentiable / Convex

  •  Unconstrained
    • First order - subdifferential contains 0.
    • Second order - 
  • Constrained
    • First Order - 
    • Second Order 

Now we can have a number of convex optimization algorithms, and these algorithms may either produce a good iterate at any point during the iteration, or at the last iterate.

  1. Steepest descent - Keep picking the smallest norm subgradient, which we can find by projecting 0 onto the subdifferential, and then line searching. This has the nice effect that we make guaranteed progress at every step.
  2. Follow any subgradient - Unfortunately this wont be a descent algorithm. But we are provably going to get closer to the true solution in parameter space !
  3. Projected subgradient method for constrained optimization - 
  4. Cutting plane methods - keep making a better and better linear model of the true optimization problem. The key observation is that minimizing the maxima of multiple LPs is also an LP. Solving the maxima of LPs is called the master problem.
    1. Bundle method - Instead of solving the master problem, solve a regularized master-subproblem that adds a regularization to keep the new iterate close to the old one.
    2. Trust region method

Basic properties of the subgradient, subdifferential, and directional derivative

  1. Directional derivatives are one-sided limits (not two-sided). Directional derivatives always exist for convex functions.
  2. Subgradient is a linear-underestimator of a convex function. Note that both the direction as well as the norm of the subgradient matters. 
  3. Subdifferential are closed-convex sets. We get a new subdifferential at each point, and if a point is in the interior of the domain of a convex function then the corresponding subdifferential is compact! which means it cannot be a cone! Also shrinking a subgradient by multiplying with a small scalar and then taking a step may not help, because a shrunk-vector may not lie in the subdifferential because 0 may not be in the subdifferential.
  4. The directional-derivative in direction \( d \) is lower bounded by \( \sup_{g \in \partial f(x)} g^T d \) and the steepest descent direction is the one that minimizes the directional derivative which is negative of the minimum norm element of the subdifferential.