3D Transforms

Tags
3D
Published
September 29, 2024

There is one topic in robotics and 3D computer vision that constantly confuses everyone regardless of your experience and degree. Its 3D poses and 3D transforms. There is a great article by Announcing posetree.py: Wrangling Timestamps and Transforms for Robots. This article was partly inspired by it, but I want to go a little bit deeper into the math. First, let’s define the terms, as per Announcing posetree.py: Wrangling Timestamps and Transforms for Robots:

  1. There is a pose. Pose is always defined against something else. Everything is relative, as mr Einstein once declared. So we need a frame, a coordinate frame to be precise. For the sake of simplicity we’ll define the pose as 6DOF vector that can also be declared as a 4x4 matrix with rotation and translation component.
  2. And there is a transform. The transform is essentially the change of pose, its an action.
  3. There is also a frame transform - moving finding the pose of the original object in the new coordinate frame

So, let's consider in more detail each of these things, and let's start with some notations that are widely used in these cases and are simplifying understanding and operations with 3D transforms.

  1. We call a pose of frame B in the reference frame A: TBAT^A_B.
  2. We call a transform from pose B to pose C in the reference frame A: TBCAT^A_{B\rightarrow C}.
  3. We call the change of the reference frame from A to C for the pose B: TBACT^{A\rightarrow C}_B.

So the notation we are going to use is going to be as straightforward and descriptive as possible.

Now, let's consider a simple problem: we have a camera C1C_1 looking at the robot arm, and we have a plate with an object attached to that robot arm. Robot arm moves from pose G1G_1 to pose G2G_2. We know the pose of the object in the coordinate frame of camera in pose 1: B1B_1. How can we derive it in pose 2: B2B_2? We also have a second camera located in pose C2C_2 - what would be the pose of the part in each of these scenes in the reference frame of camera C2C_2?

image

First, lets write down the notations, known parameters and unknowns, and then we’ll derive a couple of intuitive rules that will help us to never confuse the transforms ever again.

What we know:

TRC1T^{C_1}_R - the pose of the robot base wrt camera C1C_1 (usually obtained through hand-eye calibration)

TC2C1T^{C_1}_{C_2} - the pose of the camera C2C_2 wrt camera C1C_1

TG1RT^{R}_{G_1} - the pose of the robot gripper in state 1 wrt robot base,

TG2RT^{R}_{G_2} - the pose of the robot gripper in state 2 wrt robot base,

TB1C1T^{C_1}_{B_1} - the pose of the part in state 1 wrt camera C1C_1.

What we want to find is the following:

TB2C1T^{C_1}_{B_2} - the pose of the part in state 2 wrt camera C1C_1.

TB1C2T^{C_2}_{B_1} and TB2C2T_{B_2}^{C_2} - the pose of the part in states 1 and 2 wrt camera C2C_2.

We also know that the part didn’t move on the attached plate, so the transform from the gripper to the part didn’t change.

Lets start solving this problem from writing down 2 simple rules of dealing with poses and transforms, and lets try to understand the intuition behind them first:

  1. (TAB)1=TBA(T_{A}^{B})^{-1} = T_{B}^{A}
  2. TBATCB=TCAT^A_B T^B_C = T^A_C

These two rules will help us define all the consequent rules and solve any problem. However, we need to understand the intuition.

<add the intuition>

Now lets write down the equations for the pose of the part wrt the camera C1C_1:

TB1C1=TRC1TG1RTB1G1T^{C_1}_{B_1}=T^{C_1}_R T^R_{G_1} T^{G_1}_{B_1}

TB2C1=TRC1TG2RTB2G2T^{C_1}_{B_2}=T^{C_1}_R T^R_{G_2} T^{G_2}_{B_2}

However, its easy to notice that the part didn’t move wrt to the gripper, so TB1G1=TB2G2T^{G_1}_{B_1} = T^{G_2}_{B_2}. Therefore,

TB1G1=(TRC1TG1R)1TB1C1=TRG1TC1RTB1C1T_{B_1}^{G_1}=(T^{C_1}_R T^R_{G_1})^{-1}T_{B_1}^{C_1} = T^{G_1}_R T^R_{C_1}T^{C_1}_{B_1}

TB2C1=TRC1TG2RTB2G2=(TRC1TG2RTRG1TC1R)TB1C1T^{C_1}_{B_2}=T^{C_1}_R T^R_{G_2} T^{G_2}_{B_2} =(T^{C_1}_R T^R_{G_2} T^{G_1}_R T^R_{C_1})T^{C_1}_{B_1}

The equation may look complicated, but thats only because of our non-trivial knowledge of TB1G1=TB2G2T^{G_1}_{B_1} = T^{G_2}_{B_2}. In fact, it just says

TB2C1=TG2C1TB1G1T^{C_1}_{B_2}=T^{C_1}_{G_2}T^{G_1}_{B_1}

Which makes total physical sense.

Now lets find the poses of part in states B1B_1 and B2B_2 wrt to another camera C2C_2:

TB1C2=TC1C2TB1C1=(TC2C1)1TB1C1T^{C_2}_{B_1} = T^{C_2}_{C_1} T^{C_1}_{B_1} = (T^{C_1}_{C_2})^{-1}T^{C_1}_{B_1}

TB2C2=TC1C2TB2C1=(TC2C1)1TB2C1T^{C_2}_{B_2} = T^{C_2}_{C_1} T^{C_1}_{B_2} = (T^{C_1}_{C_2})^{-1}T^{C_1}_{B_2}

You can notice that to change the coordinate system from C1C_1 to C2C_2 we multiply the original poses of the part by inverse of the pose of C2C_2 wrt C1C_1. So in some sense the coordinate system transform is the inverse of the pose.

Now lets think of a couple of other cases:

  1. What if the robot is jogging, and we know that its pose has changed by a transform matrix TBCAT^A_{B\rightarrow C}. What does it tell us about what is the new pose, and how this transform would look like from a different camera?
  2. What would be the derivative of the transform - speed? How is the coordinate system relevant there?

Lets think about problem 1 first: this time we can’t get away with pure matrix operations and are going to need two separate operations: translation and the rotation.

What if now we want to define the same transform from a different coordinate frame? e.g. from a perspective of camera C2C_2? Intuitively, we could change our basis to C1C_1, do the transform and return back. And thats exactly how that transform would be defined:

TBCC2=TC1C2TBCC1TC2C1T^{C_2}_{B \rightarrow C} = T^{C_2}_{C_1} T^{C_1}_{B \rightarrow C} T^{C_1}_{C_2}

Doesn’t this look similar to the equation from above?