Based on link and link.

Hamiltonian dynamics operate on momentum, position vectors \( p \in \mathbb{R}^d \text{ and } q \in \mathbb{R}^d \). The system is described by a function \( H(p, q) \) called the Hamiltonian function. This function obeys certain differential equations that arise due to symmetries and conservation laws. These differential equations are

\[ \begin{align}\frac{dq}{dt} &= \frac{\partial H}{\partial p}&= \text{velocity} \\ \frac{dp}{dt} &= - \frac{\partial H}{\partial q} &\approx \text{force} \end{align} \]

Note that these differential equations automatically mean that the hamiltonian function is conserved/constant w.r.t. time because

\[ \frac{dH}{dt} = \frac{\partial H}{\partial p} \frac{dp}{dt} + \frac{\partial H}{\partial q}\frac{dq}{dt} = \frac{dq}{dt} \frac{dp}{dt} - \frac{dp}{dt}\frac{dq}{dt} = 0 \]

Also the hamiltonian is reversible in time, simply by negating the time-derivative. Often we can write this hamiltonian as the sum of the kinetic plus the potential energy, i.e. \[ H = \text{Potential}, U(q) + \text{Kinetic}, K(p) \]Note that "position" and "momentum" can be replaced by made up quantities, they are "formal variables" of the system. E.g. when doing hamiltonian monte carlo the system is "formalized" as "q" (the position) being the random variable that we want to sample, and "p" being the mean-zero non-standard variance gaussian that we can sample.

Liouville's theorem of volume preservation for hamiltonian systems.

If we forward-execute the differential equations for a set of points in some region R of (q,p) space that has volume V, then the volume of the the image of R will also be V. This can be proven either by showing that the divergence of this vector field is zero (see theorem below), or by directly showing that the determinant of the jacobian matrix of the instantaneous mapping is 1 at arbitrary time t.

\(\displaystyle \text{det}\Big(\text{jac}\big((q(t), p(t)) \to (q(t+\delta), p(t+\delta))\big)\Big) = 1\)

The instantaneous mapping can be approximated as

\(T_{\delta}(q,p) = \begin{bmatrix}q \\ p \end{bmatrix} + \delta \begin{bmatrix}dq/dt \\ dp/dt\end{bmatrix} = \begin{bmatrix}q \\ p \end{bmatrix} + \delta \begin{bmatrix}\partial H/\partial p \\ \partial H/\partial q\end{bmatrix}\)

and therefore its jacobian can be written as an identity matrix (from the first [q,p] term) plus the matrix of second order partial derivatives, i.e.

\(B_\delta = \text{jac}(T_{\delta}) = \begin{bmatrix} 1 + \delta \frac{\partial^2H}{\partial q_j \partial p_i} & \delta \frac{\partial^2H}{\partial p_j \partial p_i} \\ -\delta \frac{\partial^2H}{\partial q_j \partial q_i} & 1 -\delta \frac{\partial^2H}{\partial p_j \partial q_i} \end{bmatrix}\)

Theorem : All divergence free systems preserve volume.
Let \(D \text{ be an open and bounded subset of } \mathbb{R}^d \text{ and let }\dot{y} = f(y) \) be a system of differential equations. The function that maps D to its image at time t is called the flow. Let \(\varphi_t\) be the flow, i.e. \(D(t) = \varphi_t(D)\) and \(\text{vol}(D(t)) = \int_{D(t)} dV\). . The divergence of the gradient-vector-field is defined as \(\displaystyle \sum_i \frac{\partial \dot{y}_i}{\partial y_i}\)and this is a vector-field itself. If this divergence-verctor-field is 0 for all time then \(\text{vol}(D(t)) = \text{vol}(D)\). This proof primarily relies on three facts. The divergence of a gradient-vector-field is the trace of the jacobian of the time-derivative-vector-field. \(\text{det}(I + \epsilon A) = 1 + \epsilon\ \text{trace}(A) + \mathscr{O}(\epsilon^2)\) \(\text{vol}(D(t)) \triangleq \displaystyle \int \text{det}\Big(\frac{\partial \varphi_t}{\partial y}\big) dV\) and \(\varphi_t(y) \triangleq y + t f(y) + \mathscr{O}(t^2)\) In terms of the physical intuition the divergence is kind of the instantaneous change in the strength of a vector-field wrt to the phase-space.

Theorem : All divergence free systems preserve volume.

Let \(D \text{ be an open and bounded subset of } \mathbb{R}^d \text{ and let }\dot{y} = f(y) \) be a system of differential equations. The function that maps D to its image at time t is called the flow. Let \(\varphi_t\) be the flow, i.e. \(D(t) = \varphi_t(D)\) and \(\text{vol}(D(t)) = \int_{D(t)} dV\). . The divergence of the gradient-vector-field is defined as \(\displaystyle \sum_i \frac{\partial \dot{y}_i}{\partial y_i}\)and this is a vector-field itself. If this divergence-verctor-field is 0 for all time then \(\text{vol}(D(t)) = \text{vol}(D)\). This proof primarily relies on three facts.

The divergence of a gradient-vector-field is the trace of the jacobian of the time-derivative-vector-field.
\(\text{det}(I + \epsilon A) = 1 + \epsilon\ \text{trace}(A) + \mathscr{O}(\epsilon^2)\)
\(\text{vol}(D(t)) \triangleq \displaystyle \int \text{det}\Big(\frac{\partial \varphi_t}{\partial y}\big) dV\) and \(\varphi_t(y) \triangleq y + t f(y) + \mathscr{O}(t^2)\)

In terms of the physical intuition the divergence is kind of the instantaneous change in the strength of a vector-field wrt to the phase-space.

Symplecticness of the jacobian of the hamiltonian

Weyl came up with the word "symplectic" as a substitute for the word "complex". When a system of differential equations is symplectic then that means that the jacobian matrix is a symplectic matrix, and a symplectic matrix has a type of "weird" skew-symmetric "similarity" structure. A symplectic matrix is a \(2n \times 2n\) matrix with real entries that satisfies the condition \( M^T \Omega M = \Omega \) where \(\Omega\) is a fixed nonsingular, skew-symmetric matrix. Symplectic matrices are nonsingular, they have determinant +1 and their inverse matrix can be easily found as \(M^{-1} = \Omega^{-1} M^T \Omega\). They form a subgroup of the general linear group under matrix multiplication. Volume preservation under a linear operator is naturally implied by symplecticness.

The time evolution of Hamilton's equations is a "symplectomorphism" because it conserved the symplectic 2-form. A 2-form is a function that measures oriented areas. A numerical scheme for the discretization of a differential equation is a symplectic integrator if it also conserves this 2-form. Most of the usual numerical methods like the Euler scheme and the Runge-Kutta scheme are not symplectic therefore other methods like the leap-frog method are needed. Remember that when we are doing numerical integration then the whole game is in coming up with the right sequence of discrete which will not produce compounding errors.

For example, consider this example where the position and momentum are both 1-dim values and the kinetic energy is momentum-squared and potential energy is position-squared. Then

\[ \frac{dq}{dt} = p, \text{ and } \frac{dp}{dt} = -q \implies \frac{d^2p}{d^2t} = -p \]

which means that p is a sinusoidal function. Now if we try to do numerical integration of this system of differential equations by the following scheme called the euler-scheme

\[ \begin{align} p(t+\epsilon) := p(t) + \epsilon \dot{p} (t) = p(t) - \epsilon q(t)\\ q(t+\epsilon) = q(t) + \epsilon \dot{q}(t) = q(t) + \epsilon p(t) \end{align} \]

then we can see that this discretization is non-symplectic because the matrix looks like this \( \begin{bmatrix}1 & -\epsilon \\ \epsilon & 1 \end{bmatrix} \)which has determinant \( 1 + \epsilon^2 \) otoh if we do discretization in this manner

\[ \begin{align} p(t+\epsilon) &:= p(t) + \epsilon \dot{p} (t) &&= p(t) - \epsilon q(t)\\ q(t+\epsilon) &= q(t) + \epsilon \dot{q}(t+\epsilon) &&= q(t) + \epsilon p(t+\epsilon) \end{align} \]

then the transformation is \( \begin{bmatrix}1 & -\epsilon \\ \epsilon & 1-\epsilon^2\end{bmatrix} \) which has determinant 1 !! So this simple "gibbs-sampling" style sequential update does wonders. The leapfrog method is even better for integrating symplectic systems because it works as follows, first do a half-update to momentum, then a full update to position, and then the remaining half update to momentum.