Poses, quaternions, and the SE(3) Manifold

Most of this info from A tutorial on SE(3) transformation parameterizations.

Basic terminology

The set of all invertible matrices forms the general linear group GL(3,R). Amongst these we have the subgroup of orthogonal matrices O(3). Within these we have the subgroup of matrices that only have determinant 1, (not -1) called SO(3). SO(3) x R³ is called SE(3). Elements of SE(3) are called poses, they represent a rotation, and a translation defined wrt to the new coordinate angles. The fact that we rotate first, and translate after that is just convention.

Quaternions are numbers of the form \(q = a + b \rm{i} + c\rm{j} + d \rm{k}\) and they multiply in this form \(\displaystyle \rm{i}\rm{i} = \rm{j}\rm{j} = \rm{k}\rm{k} = -1 \text{ and } ij \to k, jk \to i, ki \to j, ji \to -k , kj \to -i, ik \to -j\). If q is a pure quaternion then \(q^2 = -(b^2 + c^2 + d^2)\) otherwise \(q^2 = a^2 -(b^2 + c^2 + d^2) + ab\ {\rm i} + ac\ {\rm j} + ad\ {\rm k} \). The inverse of a quaternion is \(q^{-1} = \frac{1}{a^2 + b ^2 + c^2 + d^2} a - b\ {\rm i} - c\ {\rm j} - d\ {\rm k}\) which means that we can get the absolute value of a quaternion by multipying it with \(\text{adj}(q) = a - bi -cj - dk\). Quaternion q can be used to represent a rotation in 3d space using a special representation and an associated special operation. Before we go into how quaternions representation rotations let's first setup different ways of representing rotations.

Euler angles : Three rotations are sufficient to reach any target frame (in 3d). The rotations may be extrinsic (about a motionless frame), or intrinsic (sequential rotations about a coordinate system that moves together with the body!) Intrinsic directions are easy for an agent to act upon. Extrinsic vs intrinsic is the difference b/w north-south-east-west and left-right in a 2d plane.

If I tell you to go west for 1 km, and then north for 1km, then I am giving extrinsic directions,
but I tell you to take a left for 1km, and then take a right, I am giving you intrinsic directions!

For defining intrinsic rotations in 3D, there exist 12 possible sequences of rotation axes, divided into two groups

Proper/Classical Euler Angles -- stuff like zyz where one of the axis of rotation repeats.
Tair-Bryan/Cardan Angles - called (Yaw,Pitch,Roll) - Here the axis of rotation does not repeat !
Sometimes the YPR angles are also called the Euler angles but that's just sloppy notation.

Axis-Angle / RPY representation
The RPY representation (Roll : x0, Pitch: y0, Yaw: z0) is conventionally defined with extrinsic rotations, when we do this we are using angles to define a point on the sphere, and then a roll about that line. This is closely related to the angle-axis representation which is closely aligned to the quaternions. In Axis-Angle we use 3 coordinates that represent a point on the unit-sphere instead of angles, and another number that represent a roll around that axis. One can also multiply the angle with the direction vector to get back only three parameters (called the exponential coordinate parameterization). Note that Axis-Angle is completely different from Cardan angles, and Euler Angles, but it's closely related to the RPY representation.

Finally we come back to the quaternions

Consider the RPY representation, which is most close to the axis-angle. The quaternion is constructed as \(\Big(\cos(\text{roll}/2) , \sin(\text{roll}/2) \big(\cos(\text{yaw})\cos(\text{pitch}), \sin(\text{yaw})\cos(\text{pitch}), \sin(\text{pitch})\big)\Big)\). The usefulness of the quaternion representation is that

q can rotate a 3d point via an operation called conjugation. So, let's say that we have a point \(p\) in 3d space, first we represent it in quaternion space as \(\tilde{p} = p_x {\rm i} + p_y {\rm j} + p_z {\rm k}\) then we conjugate it with q as follows \(\text{conj}_q(\tilde{p}) = q \tilde{p} q^{-1}\). Then we can read the imaginary parts of the resulting quaternion and get the output.
Secondly, we can represent composition of rotations as just quaternion multiplication! So we can compose quaternions, and we can apply them.

The conversion between Euler angles and Quaternions is best explained on this page along with lots of converters. Now that we know how to easily compose rotations, now we can think about how to chain poses. For example, if we compose joint-lever assemblies as follows then we can get the world-pose of the final object by composing the corresponding poses for each lever.

Basic algebraic terminology before we move on any further

Semi-group - 1 operation, closure, identity, associativity
Group - semi-group with inverse.
Abelian group - group with a commutative operation
Ring - 2 operations, an abelian group under +, and semi-group under the other. Distributive over the two operations.
Field - An abelian group over +, and a Set - {0} is a group (not necessarily abelian) over x. A field is the basic structure over which division can be defined. Real and complex numbers are abelian over multiplication as well, additionally real numbers have total ordering. Quaternions are not even commutative.
Vector Space over a field - A set of vectors together with a field of scalars. The vectors are an abelian group amongst themselves, and closed under multiplication with scalars, and distributive.

The Inner-Product bilinear form

The Cross-Product bilinear product

An inner-product on a vector-space is a bilinear function that measures angles. When such a space is complete then we get a hilbert space.

Algebras are vector-space equipped with a bilinear-product function that takes in two vectors and outputs another vector! Such vector spaces are called algebras. For example, the algebra of 3D vectors (non-assoc, non-comm, satisfies jacobi identity), or the algebra of polynomials with polynomial multiplication (associative and commutative) or the algebra of quaternions under normal multiplication. (associative but non-commutative)

Manifold - See other notes.
Smooth Manifold embedded in R^n - Every point p in this manifold is surrounded by an open subset of the manifold that can be represented as the image of a subset of R^n. Note that every compact manifold can be embedded into some Euclidean space. (General Position Theorem) so the special part is the smoothness. The image is smooth in the sense that it and its inverse are continuous and the infinitely differentiable and the derivative at the origin is injective. The notion of differentiability exists on it.
Tangent space of a manifold - A D-dimentional manifold embedded in R^N (with N >= D) has a D-dimensional tangent-space associated with every point in the manifold. This tangent-space is a vector-space of all possible velocities of a point moving on the manifold that passes through the point., e.g. \({\rm T}_xM\) contains all the possible "velocity" vectors of a particle at point x constrained to move in the manifold M. Note that the Klein bottle is a 2-dimensional manifold that cannot be smoothly embedded in R³ but it can be embedded in R⁴which means that at each point, a particle can move only in two dimensions, even though the particle is living in a 4d-space.
Lie group - A group that is also a differentiable manifold. Both the group product operation and its inverse are smooth differentiable functions.
Linear Lie Group - Any submanifold of \({\rm M}(N, \mathbb{R})\) (NxN real matrices) that is also a lie-group under matrix-multiplication is called a Linear-Lie group. Von Neumann and Cartan proved that any closed subgroup of \({\rm GL}(N, \mathbb{R})\) is a linear-lie group.
Lie Algebra - A lie-algebra is a vector-space with a cross-product bilinear-product function called the Lie-Bracket. This Lie-Bracket additionally satisfies the Jacobi identitiy and it is alternating-multilinear, just like the normal cross-product. But note that a lie-algebra is distinct from the lie-group itself.

The Lie Group and the Lie Algebra are two sides of a coin. The Lie-Algebra associated to a lie-group M is the tangent space at the identity element. You go from the a point on the manifold/lie-group to the algebra/velocity-space using the logarithm map, and you go from the algebra to the manifold using the exponential map.

Deeper dive into Lie Brackets and Relation b/w Lie-Group and Lie-Algebra.

The lie-bracket may be neither associative not commutative in general.
The lie-bracket satisfies the Jacobi identity, i.e. \(x \times (y \times z) + y \times (z \times x) + z \times (x \times y) = 0\)
The lie-bracket satisfies anti-commutativity which is equivalent to alternativity given bilinearity and jacobi identity.
Anti-commutativity \([a, b] = -[b, a]\) and alternativity \([a, a] = 0\)

E.g. the lie-bracket over the linear-lie-group is defined as the matrix commutator \([A,B] = AB - BA\).

There is a natural Lie-algebra associated with a Lie-group. The tangent-space of a lie-group \(\rm M\)at its identity is a Lie-algebra \(\mathfrak{m}\) that is naturally associated with \(\rm M \) . The exponential map maps elements from the algebra to the manifold/group, and the logarithm maps manifold to algebra.

The hat and the wedge operator. The hat/wedge operator maps a vector to its corresponding skew-symmetric matrix. The vee operator is the inverse of the hat map, but the generalized vee operator removes all symmetric contributions to a matrix

\(\displaystyle \omega^{\wedge} = \begin{pmatrix} 0& -z& y\\ z& 0& -x\\ -y& x& 0 \end{pmatrix} \text{ where } \omega = \begin{bmatrix}x \\ y \\ z \end{bmatrix} \text{ , } f^\vee = \frac{1}{2} \begin{bmatrix} f_{32} - f_{23}\\ f_{13} - f_{31} \\ f_{21} - f_{12}\end{bmatrix} \text { where } f \text { is a matrix}\)

SE(3) as a Lie Group and \(\displaystyle \mathfrak{se}(3)\) as a Lie Algebra

There are a few things we want to do when we define a parameterization and a calculus over that parameterization, we want to keep track of how the parameterized entities compose with each other, and we want to know how the parameterized entities can act on the real-world. In the case of poses in SE(3) that means that we need to know how to compose a pose with a point and its inverse and we need to compose poses themselves, and how to invert poses. See chapters 3-7 for these details for SE(3).

The group \(\mathbf{SO}(3)\) and its algebra \(\mathfrak{so}(3)\) - Since SE(3) has a manifold structure of SO(3) x R³ therefore it makes sense to first study the relation between the group \(\mathbf{SO}(3)\) and its algebra \(\mathfrak{so}(3)\).

\(\mathfrak{so}(3)\) is a vector-space with the following three basis "elements". I am calling them basis-elements instead of vectors because they are actually matrices. An arbitrary element of the tangest space is as follows.

\(\alpha \begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & -1 \\ 0 & 1 & 0 \\ \end{pmatrix} + \beta \begin{pmatrix} 0 & 0 & 1 \\ 0 & 0 & 0 \\ -1 & 0 & 0 \\ \end{pmatrix} + \gamma \begin{pmatrix} 0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 0 \\ \end{pmatrix}\)

so any element of \(\mathfrak{so}(3)\) will be written as a weighted sum of these 3 matrices. Each basis-element represents infinitesimal rotation about the corresponding axis. Now recall that the original elements in the manifolds themselves represented rotations around axis. Those rotations represented in the matrix form are the following.

\(\begin{pmatrix} \cos(\text{angle-Z}) & -\sin(\text{angle-Z}) & 0 \\ \sin(\text{angle-Z}) & \cos(\text{angle-Z}) & 0 \\ 0 & 0 & 1 \\ \end{pmatrix} \text{ and } \begin{pmatrix} \cos(\text{angle-Y}) & 0 & \sin(\text{angle-Y}) \\ 0 & 1 & 0 \\ -\sin(\text{angle-Y}) & 0 & \cos(\text{angle-Y}) \\ \end{pmatrix} \text{ and } \begin{pmatrix} 1 & 0 & 0 \\ 0 & \cos(\text{angle-X}) & -\sin(\text{angle-X}) \\ 0 & \sin(\text{angle-X}) & \cos(\text{angle-X}) \\ \end{pmatrix} \)

The logarithm map will e.g. take us from the manifold (angle-X, angle-Y, angle-Z) to the tangent space \((\alpha, \beta, \gamma)\) . and the exponential map takes us from the tangent space to the manifold. According to Rodrigues (1840) the exponential map for R3 going from \((\alpha, \beta, \gamma)\) to the rotation matrix is as follows

\(\exp(\omega) \equiv \text{matexp}(\omega^\wedge) = I_3 + \frac{\sin(|\omega|)}{|\omega|} \omega^\wedge + \frac{1 - \cos(|\omega|)}{|\omega|} (\omega^\wedge)^2\)

This can be verified/tested using the following code, so rodrigues gives us a crazy identity that seems kind of unbelievable ! And it turns out that the formula for the exponential map is even more simple if we are mapping to a quaternion.

\(\displaystyle \exp(\omega) \equiv \begin{cases}(1,0,0,0)^T &\text{ if } \omega = (0,0,0)\\ \big( \cos(|\omega|/2), \sin(|\omega|/2) \,\omega / |\omega|\big) \end{cases}\)

import scipy
import scipy.linalg
import numpy as np
w = [1, 2, 3]
x,y,z = w
w_wedge = [[0, -z, y], [z, 0, -x], [-y,
w_wedge = np.array(w_wedge)
theta = np.linalg.norm(w)
# lhs requires 1.86e-4s to compute
lhs = scipy.linalg.expm(w_wedge)
# rhs requires 1.61e-5s to compute!
rhs = (np.eye(3)
  + np.sin(theta) / theta * w_wedge
  + (1 - np.cos(theta)) / theta / theta * w_wedge @ w_wedge)
np.testing.assert_allclose(lhs, rhs)
print(lhs)

#[[-0.69492056,  0.71352099,  0.08929286],
#  -0.19200697, -0.30378504,  0.93319235],
#   0.69297817,  0.6313497 ,  0.34810748]]

The group SE(3) and its algebra \(\mathfrak{se}(3)\)

Now we can also compute the exponential \((\mathfrak{se}(3) \to \mathbf{SE}(3))\) and logarithm map \((\mathbf{SE}(3) \to \mathfrak{se}(3))\). Let

\(v = \begin{pmatrix} t \\ \omega \end{pmatrix}\) then the matrix exponential is \(\exp(v) \equiv \exp(\begin{pmatrix} \omega^\wedge & t \\ 0& 0\end{pmatrix}) = \begin{pmatrix} \exp(\omega^\wedge) & V \times t \\ 0 & 1\end{pmatrix} \) where

\(V = I_3 + \frac{1 - \cos(\theta) }{\theta^2} \omega^\wedge + \frac{\theta - \sin(\theta)}{\theta^3} (\omega^\wedge)^2\).

Bonus material : Optimization on SE(3)

The basic idea is simply that the derivatives live in the tangent space. Therefore before applying them to a point in the manifold, we must bring them back to the manifold using the exponential map, and then apply them to the current point as a group action. Se chapter 10 and Appendices of the following tutorial.

https://ingmec.ual.es/~jlblanco/papers/jlblanco2010geometry3D_techrep.pdf