3D Transforms

Tags

Published

September 29, 2024

There is one topic in robotics and 3D computer vision that constantly confuses everyone regardless of your experience and degree. Its 3D poses and 3D transforms. There is a great article by Announcing posetree.py: Wrangling Timestamps and Transforms for Robots. This article was partly inspired by it, but I want to go a little bit deeper into the math. First, let’s define the terms, as per Announcing posetree.py: Wrangling Timestamps and Transforms for Robots:

There is a pose. Pose is always defined against something else. Everything is relative, as mr Einstein once declared. So we need a frame, a coordinate frame to be precise. For the sake of simplicity we’ll define the pose as 6DOF vector that can also be declared as a 4x4 matrix with rotation and translation component.
And there is a transform. The transform is essentially the change of pose, its an action.
There is also a frame transform - moving finding the pose of the original object in the new coordinate frame

So, let's consider in more detail each of these things, and let's start with some notations that are widely used in these cases and are simplifying understanding and operations with 3D transforms.

We call a pose of frame B in the reference frame A: $T^A_B$ .
We call a transform from pose B to pose C in the reference frame A: $T^A_{B\rightarrow C}$ .
We call the change of the reference frame from A to C for the pose B: $T^{A\rightarrow C}_B$ .

So the notation we are going to use is going to be as straightforward and descriptive as possible.

Now, let's consider a simple problem: we have a camera $C_1$ looking at the robot arm, and we have a plate with an object attached to that robot arm. Robot arm moves from pose $G_1$ to pose $G_2$ . We know the pose of the object in the coordinate frame of camera in pose 1: $B_1$ . How can we derive it in pose 2: $B_2$ ? We also have a second camera located in pose $C_2$ - what would be the pose of the part in each of these scenes in the reference frame of camera $C_2$ ?

First, lets write down the notations, known parameters and unknowns, and then we’ll derive a couple of intuitive rules that will help us to never confuse the transforms ever again.

What we know:

$T^{C_1}_R$ - the pose of the robot base wrt camera $C_1$ (usually obtained through hand-eye calibration)

$T^{C_1}_{C_2}$ - the pose of the camera $C_2$ wrt camera $C_1$

$T^{R}_{G_1}$ - the pose of the robot gripper in state 1 wrt robot base,

$T^{R}_{G_2}$ - the pose of the robot gripper in state 2 wrt robot base,

$T^{C_1}_{B_1}$ - the pose of the part in state 1 wrt camera $C_1$ .

What we want to find is the following:

$T^{C_1}_{B_2}$ - the pose of the part in state 2 wrt camera $C_1$ .

$T^{C_2}_{B_1}$ and $T_{B_2}^{C_2}$ - the pose of the part in states 1 and 2 wrt camera $C_2$ .

We also know that the part didn’t move on the attached plate, so the transform from the gripper to the part didn’t change.

Lets start solving this problem from writing down 2 simple rules of dealing with poses and transforms, and lets try to understand the intuition behind them first:

$(T_{A}^{B})^{-1} = T_{B}^{A}$
$T^A_B T^B_C = T^A_C$

These two rules will help us define all the consequent rules and solve any problem. However, we need to understand the intuition.

Now lets write down the equations for the pose of the part wrt the camera $C_1$ :

$T^{C_1}_{B_1}=T^{C_1}_R T^R_{G_1} T^{G_1}_{B_1}$

$T^{C_1}_{B_2}=T^{C_1}_R T^R_{G_2} T^{G_2}_{B_2}$

However, its easy to notice that the part didn’t move wrt to the gripper, so $T^{G_1}_{B_1} = T^{G_2}_{B_2}$ . Therefore,

$T_{B_1}^{G_1}=(T^{C_1}_R T^R_{G_1})^{-1}T_{B_1}^{C_1} = T^{G_1}_R T^R_{C_1}T^{C_1}_{B_1}$

$T^{C_1}_{B_2}=T^{C_1}_R T^R_{G_2} T^{G_2}_{B_2} =(T^{C_1}_R T^R_{G_2} T^{G_1}_R T^R_{C_1})T^{C_1}_{B_1}$

The equation may look complicated, but thats only because of our non-trivial knowledge of $T^{G_1}_{B_1} = T^{G_2}_{B_2}$ . In fact, it just says

$T^{C_1}_{B_2}=T^{C_1}_{G_2}T^{G_1}_{B_1}$

Which makes total physical sense.

Now lets find the poses of part in states $B_1$ and $B_2$ wrt to another camera $C_2$ :

$T^{C_2}_{B_1} = T^{C_2}_{C_1} T^{C_1}_{B_1} = (T^{C_1}_{C_2})^{-1}T^{C_1}_{B_1}$

$T^{C_2}_{B_2} = T^{C_2}_{C_1} T^{C_1}_{B_2} = (T^{C_1}_{C_2})^{-1}T^{C_1}_{B_2}$

You can notice that to change the coordinate system from $C_1$ to $C_2$ we multiply the original poses of the part by inverse of the pose of $C_2$ wrt $C_1$ . So in some sense the coordinate system transform is the inverse of the pose.

Now lets think of a couple of other cases:

What if the robot is jogging, and we know that its pose has changed by a transform matrix $T^A_{B\rightarrow C}$ . What does it tell us about what is the new pose, and how this transform would look like from a different camera?
What would be the derivative of the transform - speed? How is the coordinate system relevant there?

Lets think about problem 1 first: this time we can’t get away with pure matrix operations and are going to need two separate operations: translation and the rotation.

What if now we want to define the same transform from a different coordinate frame? e.g. from a perspective of camera $C_2$ ? Intuitively, we could change our basis to $C_1$ , do the transform and return back. And thats exactly how that transform would be defined:

$T^{C_2}_{B \rightarrow C} = T^{C_2}_{C_1} T^{C_1}_{B \rightarrow C} T^{C_1}_{C_2}$

Doesn’t this look similar to the equation from above?