Skip to main content

Finding Optimal Solutions to the Rubik’s Cube – 3. Representation

Table of Contents

  1. Introduction
  2. Mathematics
  3. Representation
  4. Representation Evaluation
  5. Breadth-first Search
  6. Iterative Deepening Depth-first Search
  7. IDA*

3. Representation

Now that we have a better understanding of what we’re dealing with, we can finally start putting pen to paper (or a keyboard to a text editor?). One of the most important design decisions is this:

How do we represent a Rubik’s Cube in memory?

It is hopefully apparent that with 4.3 x 1019 possible states, it’s impractical to try to find an optimal solution by hand. Therefore, we are probably going to need a computer to do most of the heavy lifting. Seeing as efficiency is going to be very important later on, I’m going to assume we’re working in C++, though the basic ideas should translate well enough to other languages too. In this section, I’m going to identify several candidate representations and evaluate them according to the criteria outlined below. There can be only one victor in this battle.

Criteria

If we are to evaluate several possible representation strategies, we should decide on a few criteria. These are some of the most important:

  • Space – An ideal representation would be very small in terms of memory footprint. For some of the techniques that will be addressed later, we will need to store a significant number of states in memory at the same time. The smaller the representation, the more states can be stored. Or equivalently, a small representation will lower the physical requirements of the host machine (e.g. moving from 32 GB of RAM to 8 GB).
  • Overhead Cost – It is very important that we are able to actually use the representation to accomplish some work. As little time as possible needs to be spent on basic operations, such as a rotation of a single face.
  • Abstraction Cost – Especially for debugging, we want to be able to quickly map between a state representation and a real cube. A representation like “State #17769227” tells us very little about the locations of each individual piece, for example.

Theoretical Baseline

We can actually measure the number of bits an ideal (uncompressed) representation would require using a little information theory. If we assume that all states are equally likely to appear, the minimum size is:

\(\left \lceil{\log_2{N}}\right \rceil\)

where \(N\) in this case is the total number of possible states, a value we calculated to be about 4.3 x 1019 in the last segment. In our case, we get our theoretical minimum size: 66 bits per state. In other words, no valid representation should ever use less storage.

Side note: While it would likely be possible to reduce this lower bound using some sort of compression, the associated costs of inflating/deflating the state every time we need to access it would likely add a significant amount of runtime overhead. Therefore, I’m going to leave such representation strategies as an exercise to the reader instead of directly addressing them here.

Option 1: 2D Array of bytes

Representation

A first reaction for many developers would be to write something like this:

uint8_t state[6][9];  // or 'char', enum Color, etc.

We explicitly store the color (or ID) of each individual piece on each of the 6 faces of the cube. The 2D array approach is very straightforward for understanding, but it does have one important deficiency according to our criteria–size.  A structure like this one uses 54 bytes or 432 bits per state instance. Compare that to our theoretical bound of 66 bits. It’s about 6.5x larger!

If you’re really paying attention, you can see that we don’t actually need to store the colors for each of the 6 center pieces, as long as the order is well-defined, so we could skip them, leaving us with this:

uint8_t state[6][8];

That’s a bit better. Now we’re down to 48 bytes or 384 bits per state instance, which is only 6x larger than the theoretical minimum.

The most important issue is that we only need 3 bits to store one of 6 color IDs, but we’re using 8 instead. We’re wasting more than half the space already. If we were to directly address that problem, we could compress the structure using bit-twiddling arithmetic. 48 cells * 3 bits/cell is 144 bits, which is still more than 2x as large as the theoretical minimum, but it is considerably smaller than the uncompressed version.

State Transitions

In terms of overhead, not much work needs to be done in order to perform a single face rotation. A program would simply need to shuffle 12 outer pieces and 8 inner pieces around. As all the indices involved are static, a set of translation tables could be written to essentially automate the process.

Option 2: Enumerating all states

If we really wanted to, we could start enumerating each of those 4.3 x 1019 states. For example, “State #0” could represent the solved state, “State #1” could represent the state in which a single face was rotated 90° clockwise, and so on.

The primary advantage of this approach would be space-efficiency. While most programming languages do not allow the user to create custom data-type sizes (66 bits is a bit unusual in the software world), we could get very close by using 9 bytes (or 72 bits) instead. E.g.

uint8_t state[9];

Representation

It is theoretically possible to map each unique state to a unique integer in the range [0, 4.3 x 1019), but actually performing that mapping may not be a trivial operation. For example, one could introduce a Tetravigesimal (base-24) number system to enumerate the corners and edges separately. For each of the 8 corners, we specify both a permutation (8 values) and an orientation (3 values), and for each of the 12 edges, we specify a permutation (12 values) and an orientation (2 values). The integer identifier for a given state would then be a single 20 digit, base-24 number. However, this particular encoding does not cleanly map to the desired range, so it would have to be modified to fit properly.

A much more efficient (but more complicated) alternative would be to use a Mixed Radix number system that handles the different bases of each component directly. The permutations would use Factoradic bases (8 digits for corner permutations and 12 digits for edge permutations), the corner orientations would use 7 base-3 numbers, and the edge orientations would use 11 base-2 numbers. Therefore, the final state would be a single 38 digit mixed-radix number with the following base structure:

7! 6! 5! 4! 3! 2! 1! 0! | 3 3 3 3 3 3 3 | 11! 10! 9! 8! 7! 6! 5! 4! 3! 2! 1! 0! | 2 2 2 2 2 2 2 2 2 2 2

Caveat: There are actually 12!/2 edge permutations instead of 12!, which means that when calculating the index for the edge permutation, the resulting value must be divided by 2.

It should be noted that we do not necessarily have to store the index as a single number. We could, if we are willing to sacrifice a little more space, split each component into its own entry like this:

struct State
{
    uint16_t cornerPermutation;  // Max value =      40,319
    uint16_t cornerOrientation;  // Max value =       2,186
    uint32_t edgePermutation;    // Max value = 239,500,799
    uint16_t edgeOrientation;    // Max value =       2,047
};

This structure would use 80 bits (compared to the 72-bit version above), but encoding/decoding would be easier and there would be less chance of integer overflow.

I’ve demonstrated that enumerating all the states is possible (and definitely a little crazy), but there are a few additional challenges that must still be addressed.

State Transitions

One must also be very careful to avoid excessive computational overhead in the implementation of the state transitions. A simple solution would be to “unpack” the state into some intermediate representation, shuffle the elements around, and then “repack” the result. Unfortunately, “unpacking” requires several division/mod instructions, and “packing” requires several multiplications and additions. The additional overhead introduced by the packing/unpacking process would make state transitions quite a bit slower than the 2D array approach.

It is possible to operate directly on the individual state components (corner permutation, corner orientation, edge permutation, and edge orientation) directly if one is willing to allocate enough space to hold a set of transition tables. For example, we could, in theory, create a table with 40,320 rows and 18 columns (1 per possible state transition) that maps a single corner permutation to the next corner permutation. Doing so would require (40,320 * 18) cells * 2 bytes/cell = 1,451,520 bytes, or about 1.3 MB of memory, which isn’t unreasonable on a modern computer.

The primary issue would be dealing with the edge permutation table. Using a naïve approach, that table would require (239,500,800 * 18) cells * 4 bytes/cell = 17,244,057,600 bytes, or about 16 GB of memory. That’s a bit excessive.

One solution is to split the edge permutation state into two parts. The first segment stores the permutation for just the first 4 edges, and the second segment would store the permutation for the last 8 edges. The split is uneven because the we want the number of combinations for both halves to be roughly equivalent. An even 6-6 split would give us 47,520 combinations in one table and just 5,040 combinations in the second, which is very uneven. A 4-8 split gives us 11,880 combinations in the first table, and 20,160 combinations in the second, which is much more even.

Our structure would then look more like this:

struct State 
{ 
    uint16_t cornerPermutation; // Max value = 40,319 
    uint16_t cornerOrientation; // Max value = 2,186 
    uint16_t edgePermutationHi; // Max value = 11,879
    uint16_t edgePermutationLo; // Max value = 20,159
    uint16_t edgeOrientation;   // Max value = 2,047 
};

The bases for the first table would be:

\(\frac{11!}{8!} = 11 * 10 * 9 = 990\)

\(\frac{10!}{8!} = 10 * 9 = 90\)

\(\frac{9!}{8!} = 9\)

\(\frac{8!}{8!} = 1\)

with ranges from [0, 11] for the first digit, [0, 10] for the second digit, and so on. The bases for the second table would be 7!, 6!, 5!, 4!, 3!, 2!, 1!, and 0!, with ranges from [0, 7] for the first digit, [0, 6] for the second digit, and so on.

Using a split reduces the amount of storage needed for the transition tables considerably. We would only need (11,880 * 18) cells * 2 bytes/cell = 427,680 bytes, or about 417 KB for the first table and (20,160 * 18) cells * 2 bytes/cell = 725,760 bytes, or about 708 KB for the second table, yielding a total size of about 3MB for all of the tables.

To summarize, enumerating each state is not quite as crazy as it may sound. It is possible to map from states to integers and back again without too much trouble, the representation is very compact, and it is possible to efficiently transition from one state to the next as long as we’re willing to use a small amount of space, but the representation is quite complicated, which is likely to cause increase cognitive overhead.

Option 3: Simplified enumeration of permutations

The major disadvantages to Option 2 were increased cognitive overhead and a requirement that fairly large transition tables had to be stored to efficiently enable face rotations. If we allow ourselves to use a small amount of extra storage per state, we can improve the situation on both fronts.

Representation

The core idea is to change how permutations are stored. Instead of using a single number represented with a Factoradic base, we could represent a permutation directly using cycle notation.  We just need to enumerate each of the corners and edges of the cube and then use cycle notation to directly represent the current state.  This is an example enumeration of the corners and edges:

  4 ----- 3                 * --3-- *
 /      / |               4       2 7
1 ---- 2  |             * --1-- *   |
|      |  7             5       6   *
|      | /              |       | 10
5 ---- 6                * --9-- *

For example, to perform a 90° rotation of the “Front” face, the following transformation is used on the corners: 1 -> 2 -> 6 -> 5. In cycle notation, this would look like this:

\(  \left(  \begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ 2 & 6 & 3 & 4 & 1 & 5 & 7 & 8 \end{matrix} \right)\)

Similarly, the transformation for the edges is: 1 -> 6 -> 9 -> 5, which would look like this:

\(  \left(  \begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12\\ 6 & 2 & 3 & 4 & 1 & 9 & 7 & 8 & 5 & 10 & 11 & 12 \end{matrix} \right)\)

In code, we don’t really care about the top row, since it will always be the same, so we just need to store the sequence of numbers on the bottom row.  Here’s a first attempt:

struct State 
{ 
    uint8_t  cornerPermutation[8]; 
    uint16_t cornerOrientation; 
    uint8_t  edgePermutation[12]; 
    uint16_t edgeOrientation; 
};

This structure uses 24 bytes or 192 bits, which is about 3x the theoretical minimum.

As with Option 1, we could compress this structure a bit by exploiting the fact that only 3 bits are needed to store a number in the range [0, 7] and 4 bits are needed to store a number in the range [0, 11], instead of 8. In this case, we would need 3 bits * 8 cells = 24 bits to store the corner permutation and 4 bits * 12 cells = 48 bits to store the edge permutation.

Using bit-shifting operations, we could reduce the corner permutation to a single uint32_t and we could reduce the edge permutation to a single uint64_t, leaving us with a structure like this:

struct State
{
    uint32_t cornerPermutation;
    uint16_t cornerOrientation;
    uint64_t edgePermutation;
    uint16_t edgeOrientation;
};

This structure uses exactly 128 bits, which is closer to 2x the theoretical minimum.

NOTE: We could also reduce the corner permutation to an array of 3 uint8_t’s and the edge permutation to an array of 6 uint8_t’s if we wanted to. This would reduce the space complexity to 104 bits, but it would likely making packing/unpacking the permutation elements much more difficult. A 128-bit structure is likely to be more cache-friendly than a 104-bit one, anyway.

State Transitions

One of the nice benefits to using cycle notation to represent permutations is that permutations can be composed quite easily using the Group Product. Given two permutations, P and Q, the product PQ represents the permutation that results from applying Q to P. If we treat P as the current position of each of either the corner pieces or the edge pieces and Q as a transformation permutation, PQ is the resulting permutation. For example if P is the solved state:

\(  P = \left(  \begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \end{matrix} \right)\)

And Q is the ‘Front’ transformation:

\(  Q = \left(  \begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ 2 & 6 & 3 & 4 & 1 & 5 & 7 & 8 \end{matrix} \right)\)

The product PQ would be:

\(  PQ = \left(  \begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ 2 & 6 & 3 & 4 & 1 & 5 & 7 & 8 \end{matrix} \right)\)

which is what we would expect. We could even multiply by Q again. PQ2 corresponds to a 180° of the front face:

\(  PQ^2 = \left(  \begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ 6 & 5 & 3 & 4 & 2 & 1 & 7 & 8 \end{matrix} \right)\)

In general, we can represent each of the 18 possible transformations with two permutations, one for the corners and one for the edges. Moving from one state to another is as simple as performing a group product operation on each of the permutation components.

Conclusion

The following table summarizes the properties of the various representations we’ve considered.

Size (bits) Efficiency Clarity
Uncompressed 2D array 384 Fast Easy
Compressed 2D array 144 Medium Medium
State Enumeration (Naive) 80 Slow Hard
State Enumeration (Split) 80 Fast Hard
Uncompressed Cycle Notation 192 Fast Easy
Compressed Cycle Notation 128/104 Medium Medium

The 2D array approaches do well in both efficiency and clarity, but the size per state object is relatively large. The state enumeration approach is much smaller and can be implemented efficiently if you’re very careful and willing to store a large set of translation tables, but the solution is very likely going to be harder to understand than either of the other two approaches. Using cycle notation is a variation of the state enumeration approach that drastically improves clarity and removes the need for large transition tables at the expense of a slightly larger state size.

So…who won?

Theoretically, there’s no clear winner. It seems that the cycle notation approaches beat the 2D array approaches (on paper), but it’s not obvious which of the state enumeration and cycle notation approaches should be preferred. So to settle this once and for all, I’m going to put each of these approaches to the test…in the next segment!