Petersen's Friends
This post is about error-correcting codes. Specifically, it is about highly symmetric codes whose length is a multiple of five.
My motivation for studying these codes might be revealed in a future blog post, but for now I will say only that it is not related to correcting errors. My application requires a sparse, symmetrical relation between five objects. That's exactly what an error-correcting code provides; same maths, different appliction.
Introduction
I want to choose five objects from a short menu: a set M of size between 4 and 20, say. The objects are all chosen from the same menu M, and in principle any of the five objects can be any element of M. However, the choices are not all independent. The number of ways of choosing all five objects needs to be a small subset of M⁵: a set C of size not more than 200, say.
I don't want to show any bias among the five objects or among the elements of M, so I'd like the set C of allowed 5-tuples to have lots of symmetries. Ideally, permuting the five objects should preserve C, as should permuting the elements of M. However, to keep C small I would settle for a smaller symmetry group.
Error-correcting codes
This is a well-studied mathematical problem in otherwise unrelated practical contexts. It crops up, for example, when you want to perform a single experiment to test the health benefits of several different treatments, without trying every combination of the treatments. But the dominant mathematical terminology is that used in the field of error-correcting codes.
The scenario is a sender trying to send messages to a receiver using unreliable equipment. The elements of M are symbols that the sender can send, e.g. English phonemes, or maybe each element of M is a short strings of symbols, e.g. Morse code. The elements of C are the codewords that the sender and receiver have agreed to use to communicate, e.g. the NATO phonetic alphabet. The unreliable equipment changes a small fraction of the symbols in transit. If the codewords have sufficiently different spellings (or pronunciations, or whatever), the errors can be detected or even corrected, resulting in reliable communication.
Many error-correcting codes have been devised over the years, and they are of great practical importance. They are used to a greater or lesser degree in nearly every form of communication and also in nearly every form of data storage. Even some aspects of human speech can be understood in terms of error correction.
These days, the state-of-the-art codes for digital communication are large, random-looking things that rely on brute-force computation. However, the smaller, simpler digital error-correcting codes are interesting for a different reason: they are elegant and beautifully symmetrical mathematical objects. It is these smaller codes that I will be studying in this post.
My approach
The requirement that the length of the code be a multiple of five is unusual. My strategy has been to start from geometric or combinatorial objects that have 5-fold symmetry built in, to derive codes from them, and then to use the internet and various AI assistants to study and improve on what I found. I don't need a perfect answer, and I am more than happy to smell the roses on the way.
The Petersen graph
The Petersen graph consists of ten nodes joined by fifteen edges. The nodes may be identified with the ten ways of choosing two of five objects. The edges may be identified with the fifteen ways of choosing two such pairs that don't overlap. The graph is a highly symmetrical object; every permutation of the five objects is a symmetry of the graph.
Drawing the graph.
The Petersen graph is commonly drawn in 2D as a pentagon around a pentagram. If the five nodes of the pentagon correspond to the pairs {A, B}, {C, D}, {E, A}, {B, C}, and {D, E}, then the five nodes of the pentagram correspond to the pairs {C, E}, {E, B}, {B, D}, {D, A} and {A, C}.
For example, the node {E, A} of the pentagon is connected its neighbours {B, C} and {C, D}, and also to the corresponding node {B, D} of the pentagram, which is in turn connected to its non-neighbours {C, E} and {A, C}.
Cyclicly permuting A → B → C → D → E → A rotates the diagram. Swapping e.g. A ↔ E and B ↔ D reflects the diagram. Cyclicly permuting e.g. A → B → E → D → A exchanges the pentagon with the pentagram. Other permutations of the five objects do more complicated things to the graph, all of them symmetries.
Shortest cycles
Let us try to find the shortest possible cycle in the Petersen graph. Without loss of generality, suppose that the first node is {A, B} and that the last is {D, E}. These two sets do not overlap and so the nodes are connected by an edge as required.
The second node in the cycle is connected to {A, B} and must therefore not contain A or B. It cannot be {D, E} because that's the last node in the cycle. Therefore it must contain C and one of D and E.
By the same argument, the second last node in the cycle must contain C and one of A and B.
Since the second node does not contain A or B, and the second last node does, they are different nodes and there is no 3-cycle in the graph.
Since the second and second last nodes both contain C, they are not connected by an edge, and there are no 4-cycles in the graph.
By making the second node {C, D} and the second last node {B, C}, we can complete a 5-cycle with the node {E, A}. This is the shortest possible cycle.
Half a dodecahedron
The Petersen graph can also be constructed from a regular dodecahedron by identifying opposite nodes, and therefore opposite edges. The pentagon corresponds to the top or bottom face. The pentagram corresponds to the five nodes just above the equator, which are identified with the five nodes just below the equator.
In the following diagram, the "top" face is the light-coloured one on the right, and the vertices are labelled to match the preceding diagram of the Petersen graph.
The twelve faces of the dodecahedron reduce to six when opposites are identified. Of these every pair are neighbours, in the sense that they share one edge. For each such pentagon, the remaining five nodes are connected to form a pentagram, and every pair of such pentagrams are similarly neighbours. Together, the pentagons and the pentagrams comprise all twelve 5-cycles in the Petersen graph.
The symmetries of the dodecahedron preserve the partition of the 5-cycles into pentagons and pentagrams. The Petersen graph has additional symmetries which mix them up.
The Petersen code
We can make a [15, 6, 5] binary linear error-correcting code from the Petersen graph. I will call it the Petersen code. The fifteen message bits (binary digits) correspond to the fifteen edges of the graph. We impose a parity check for each node in the graph: the three edges meeting at the node must have even parity, i.e. their sum must be even. In the diagram of the Petersen graph above, I have labelled the edges with the bits of an example valid codeword.
Any codeword that satisfies nine of the parity checks automatically satisfies the tenth. We can therefore choose six bits freely, before the parity checks determine all the other bits. The rate of the code is 6/15 = 2/5, and there are 2⁶ = 64 valid codewords.
Viewed as a black/white boundary
Intuitively, every codeword corresponds to two ways of painting the faces of a docecahedron black or white. Where two faces meet at an edge, the bit corresponding to that edge is 1 if the faces are different colours, and 0 if they are the same colour. Changing the colour of every face preserves the black/white boundary, and hence the codeword.
To construct a colouring for a codeword, you are free to choose the colour of the top face of the dodecahedron. Having coloured that face, you can then work out the colour of the neighbouring faces using the bits of the codeword. The parity check at each node ensures that as you loop through the three faces meeting at the node you will see an even number of colour changes. In the diagram of a dodecahedron above, I have painted the faces to match the codeword illustrated in the diagram of the Petersen code.
Not every colouring of the faces of the dodecahedron corresponds to a codeword. In constructing the Petersen graph from the dodecahedron I identify opposite edges. Therefore, we can only admit colourings of the dodecahedron where the black/white boundary has an inversion symmetry. This requires that every face is the same colour as its opposite face, or that every face is the opposite colour from its opposite face. In the diagram above, each hidden face has the opposite colour of the corresponding visible face.
Minimum-weight codewords
Painting two opposite faces white, and the other ten faces black (or vice versa) gives a "polar" black/white boundary of total length ten on the dodecahedron, corresponding to a 5-cycle in the Peterson graph which I earlier called a pentagon. Painting the dodecahedron white on one side and black on the other gives an "equatorial" boundary of length ten, corresponding to a 5-cycle which I earlier called a pentagram.
Any other (allowed) way of painting the dodecahedron will correspond to a different cycle (or union of cycles) in the graph. Since there are no shorter cycles than the twelve 5-cycles already enumerated, any such colouring will correspond to a codeword with more than five 1s.
Therefore the minimum possible weight of any non-zero codeword is five. The code can correct up to two errors, or detect up to four. If you make five errors, and if the corresponding edges unluckily form a 5-cycle, the corrupted message will be indistinguishable from a different valid message.
A basis
The six pentagons almost form a basis for the Petersen code, but their XOR is equal to zero. Intuitively, painting the dodecahedron entirely black is equivalent to plainting it entirely white. In addition to any five pentagons, we need one pentagram to obtain a linearly independent basis for the Peterson code.
Counting codewords
The weight of a codeword is the number of 1s it contains. For example, the codeword illustrated in the diagrams above has weight 9. The number of possible codewords with each weight is as follows:
weight count
0 1
5 12
6 10
8 15
9 20
10 6
We have already enumerated the weight-5 codewords: the six pentagons and the six pentagrams. Let us now enumerate the others (or skip to the next section).
Weight-8 codewords
The XOR of two pentagons gives a weight-8 codeword; there are fifteen such. The XOR of all six pentagons is zero. Therefore, the XOR of any four pentagons is equal to the XOR of the other two. Therefore, by taking the XOR of an even number of pentagons, we can only obtain the set of the fifteen weight-8 codewords and the zero codeword.
Since this set of sixteen codewords is closed under XOR, it forms a [15, 4, 8] binary linear subcode. This can be recognised as a simplex code: the dual of a Hamming code. It is also equivalent to a Hadamard code with one of the sixteen coordinates fixed as zero (and removed).
We can repeat this construction using the pentagrams instead of the pentagons. It turns out that this gives the same [15, 4, 8] simplex code. There are no other weight-8 codewords.
Weight-9 codewords
Consider the XOR of any three pentagons. If we count all the edges of those three pentagons, we will count fifteen edges. However, every pair of pentagons shares exactly one edge, so we've counted three edges twice. The number of edges counted once is therefore nine. These weight-9 codewords correspond to colourings where opposite faces have the same colour.
We can alternatively consider the XOR of three pentagrams. By a similar argument, this also gives a weight-9 codeword. These weight-9 codewords correspond to colourings where opposite faces have the opposite colour.
There are twenty ways of choosing three of the six pentagons. However, the XOR of three pentagons is equal to the XOR of the other three, so they only give ten weight-9 codewords. The twenty ways of choosing three pentagrams give another ten. The total number of weight-9 codewords is therefore twenty.
Weight-6 and weight-10 codewords.
The XOR of a pentagon with the corresponding pentagram gives a weight-10 codeword; there is one such codeword for each of the six pentagons.
The XOR of a pentagon with any other pentagram gives a weight-6 codeword. There are thirty ways of choosing the pentagon and the pentagram, but each weight-6 codeword is created by three such pairs, so we only get ten weight-6 codewords.
The BCH code
Note that the Peterson code does not contain any codewords with a weight larger than 10. Therefore, if we add another basis vector consisting of all 1s we will not create any new codewords with a weight smaller than 5. It turns out that the [15, 6, 5] Petersen code becomes the [15, 7, 5] BCH code.
For most purposes this BCH code is a strictly better code than the Petersen code. It has a higher rate of 7/15, yet it remains a weight-5 code that can correct up to two errors and detect up to four.
Note that new basis vector breaks every parity check. Where the Peterson code required the three edges meeting at every node to have even parity, the BCH code also allows codewords where every parity is odd.
Counting codewords
The number of codewords with each weight is as follows:
weight even odd total
0 1 0 1
5 12 6 18
6 10 20 30
7 0 15 15
8 15 0 15
9 20 10 30
10 6 12 18
15 0 1 1
Error correction
The number of symptoms (ways in which a codeword can be corrupted) are as follows:
errors correctable ambiguous incorrect
0 1 0 0
1 15 0 0
2 105 0 0
3 65 210 180
Corrupting a codeword in four or more places always makes it closer to a different codeword, which makes the decoding always incorrect.
The total number of correctable symptoms is 1 + 15 + 105 + 65 = 186. The ambiguous cases are equidistant from three codewords. Adding a third of 210 makes 256 = 2^(15 − 7) messages per codeword, as required for a fifteen-bit code with seven basis vectors.
I have written a Python program to study this code, which you too might find useful.
The even-weight code
The XOR of two even-weight codewords is an even-weight codeword. Therefore, given any linear binary code, the set of even-weight codewords is a subcode.
Applying this construction to the BCH code produces a [15, 6, 6] even-weight code. The number of codewords with each weight is as follows:
weight total
0 1
6 30
8 15
10 18
Applying the construction to the Petersen code gives an inferior [15, 5, 6] code. Applying the construction to the simplex code gives itself.
Viewed as a 5×3 grid
We've now seen four binary linear codes with 15-bit messages:
- [15, 4, 8] simplex code
- [15, 6, 5] Petersen code
- [15, 7, 5] BCH code
- [15, 6, 6] even-weight subcode
Quite a good way to visualise these codes is to arrange the message bits on a 5×3 grid.
The relationship of the 5×3 grid to the set of five objects is that the column represents a choice of four objects from the five, and the row represents a way of dividing them into two pairs.
The relationship of the 5×3 grid to the Petersen graph is that the top row represents the five edges in the outer pentagon, the bottom row represents the five edges in the inner pentagram, and the middle row represents the radial edges that join them.
The simplex code
The simplex code only has fifteen non-zero codewords, all of weight 8, which differ only by cyclically permuting the rows and columns of the grid. It therefore suffices to give one example:
10001
01010
11011
We can easily find four of the cyclic permutations that are linearly independent, and therefore form a basis for the code, e.g.:
00011 00110 01100 11000
10100 01001 10010 00101
10111 01111 11110 11101
You may check that all the other codewords can be formed by XORing these. For example, the XOR of the first two basis vectors is:
00101
11101
11000
which is the fourth basis vector shifted up one row. It's quite remarkable.
The Petersen code
To make a basis for the Petersen code, let's start from the simplex code, and add two more basis vectors of weight 5: the pentagon and the pentagram:
11111 00000
00000 00000
00000 11111
Note that this breaks the cyclic permutation symmetry of the rows. It's fairly obvious what is missing.
The BCH code
Starting from the Petersen code we can add one more weight-5 basis vector to obtain the BCH code:
00000
11111
00000
This restores the cyclic permutation symmetry of the rows.
In addition to the weight-5 basis vectors above, there are 15 other weight-5 codewords, which differ only by cyclic permutation of the rows and columns. It suffices to give one example:
10001
01010
00100
The even-weight code
Starting from the simplex code, add the following even-weight basis vectors:
11111 00000
11111 11111
00000 11111
The thirty weight-6 codewords are cyclic permutations of these:
00100 10001
01110 10101
01010 00100
Parity checks
When defining the Petersen code, I asked for the three edges meeting at each node to have even parity. When upgrading to the BCH code, I instead asked every such triplet to have the same parity. Applying this rule to every pair of neighbouring nodes we obtain fifteen 4-bit parity checks, which differ only by cyclically permuting the rows and columns of the grid. It suffices to give one example:
10001
00000
01010
Any word satisfying these parity fifteen checks belongs to the BCH code. Only eight are linearly independent, i.e. passing those eight checks guarantees that the word will pass all fifteen.
Adding one more parity check yields the Petersen code:
00000
11111
00000
Adding two more parity checks yields the simplex code:
11111 00000
00000 00000
00000 11111
Note that in the simplex code, not only is every row a parity check, but every column is too.
Returning to the BCH code, adding an overall parity check yields the even-weight subcode:
11111
11111
11111
In this code, every column is a parity check, but the rows can have either parity.
Dual codes
The set of all possible parity checks for a linear code is another linear code called its dual. By constructing the parity checks of our codes, above, we have generated some additional codes, which you may study if you are interested:
- The dual of the [15, 7, 5] BCH code is a [15, 8, 4] code.
- The dual of the [15, 6, 6] even-weight code is a [15, 9, 5] code.
- The dual of the [15, 4, 8] simplex code is the [15, 11, 3] Hamming code.
These in turn have even-weight subcodes, which have duals, some of which we have not yet seen. It is a rich vein.
Symmetries
The grid presentation reveals an order-5 and order-3 cyclic permutation symmetries, the latter of which is broken in the Petersen code. I think this makes the codes quite easy to memorise. Note that any code with both order-3 and order-5 cyclic symmetry actually has order-15 cyclic symmetry, which shows up e.g. if you read out the digits along a diagonal line.
All three codes are also preserved by a horizontal reflection of the grid, but interestingly not by a vertical reflection. Perhaps the best feature of the grid presentation is that it shows clearly what the Petersen code is missing, and that the BCH code should be preferred.
The symmetries of the grid are actually quite a small subgroup of the symmetries of the codes. The order-5 cyclic permutation corresponds to one order-5 rotation axis of the dodecahedral symmetry group, but it has five more such axes that are obscured because they break up the rows and columns. There are also ten order-3 rotation axes in the dodecahedral subgroup, all obscured in the grid, and there are a further ten reflections too. And even the dodecahedral group is missing all the order-4 symmetries and the cyclic permutation of the rows of the grid. These codes are beautifully symmetrical objects that cannot fully be appreciated in the grid presentation.
Viewed as a 4×4 grid
A natural presentation of the Hadamard code is as a 2×2×2×2 hypercube, and so a natural presentation of the simplex code is as a hypercube with a corner missing. The coordinates of a message bit in the grid are determined by its value in each of the four basis vectors.
Inspired by the hypercube, I can present the code as a 4×4 grid with a corner missing. If I use the simplex basis given earlier, the correspondence of the squares of the 4×4 grid to the 5×3 grid are as follows:
abcde egd
fghij ifcn
klmno ahjo
bklm
Though this makes plain an S4 subgroup of symmetries (permuting the axes of the 2×2×2×2 hypercube), I have not otherwise found it particularly helpful. In particular it obscures the order-5 symmetries.
Viewed as a quaternary code
The simplex code and the even-weight subset of the BCH code both have the property that every column of the 5×3 grid has even parity, i.e. these triplets of bits must be 000, 011, 101 or 110. We can recognise this as a representation of the finite field GF(4).
The field GF(4)
Interestingly, there is only one way to define a field with four elements. The only flexibility you have is to rename the elements. I will call the elements {0, 1, w, w²} with the following addition and multiplication rules:
+ | 0 1 w w² × | 0 1 w w²
----+----------------- ----+-----------------
0 | 0 1 w w² 0 | 0 0 0 0
1 | 1 0 w² w 1 | 0 1 w w²
w | w w² 0 1 w | 0 w w² 1
w² | w² w 1 0 w² | 0 w² 1 w
For example, 1³ = w³ = w²³ = 1, 1 + 1 = w + w = w² + w² = 0 and 1 + w + w² = 0. Every element is its own negative. 1 is its own reciprocal, and w and w² are reciprocals of each other. Squaring an element of GF(4) preserves 0 and 1 and exchanges w and w²; it is an order-2 operation. For non-zero elements, squaring is the same as taking the reciprocal.
Note that GF(4) differs from Z4, the integers modulo 4. Z4 is not a field because 2 has no reciprocal modulo 4.
We can represent the four elements of GF(4) as words of the 3-bit even-parity code, which I will write as columns:
0 1 w w²
--------------
0 0 1 1
0 1 0 1
0 1 1 0
Then addition in GF(4) is XOR in the 3-bit code, and multiplication in GF(4) is convolution in the 3-bit code. In particular, multiplication by w is a cyclic permutation of the rows.
This relates the length-15 binary codes to some length-5 quaternary codes, which follow.
The simplex code
The [15, 4, 8] binary simplex code corresponds to the [5, 2, 4] quaternary simplex code with the basis vecors w 1 1 w 0 and 0 w 1 1 w.
The fifteen non-zero codewords are multiples of cyclic permutations of these, all of weight 4. They correspond to the fifteen weight-8 codewords of the binary simplex code.
Including the zero word brings the total number of codewords to sixteen as required for a quaternary code with two basis vectors.
The even-weight code
The [15, 6, 6] even-weight subset of the BCH code corresponds to the [5, 3, 3] quaternary code formed by adding to the simplex code one more basis vector 1 1 1 1 1.
The thirty weight-3 codewords are multiples of cyclic permutations of 0 1 w 1 0 and w 0 1 0 w. They correspond to the thirty weight-6 codewords of the binary code.
The eighteen weight-5 codewords are multiples of 1 1 1 1 1 and multiples of cyclic permutations of w² w 1 w w². They correspond to the eighteen weight-10 codewords of the binary code.
The fifteen weight-4 codewords and the zero word from the quaternary simplex code are also included, of course. That brings the total to 64 as required.
Check sums
The [5, 2, 4] code and the [5, 3, 3] code are duals, i.e. the words of one are check sums for the other. To be precise, you can take any vector from one code and any vector from the other code and their dot-product (in GF(4)) will be zero. Furthermore, if you fix one of the codes then the other code consists of all vectors with this property.
Error correction
The [5, 3, 3] code is a perfect code. Every string of length 5 is at most one error away from exactly one codeword. In fact, it is the simplest non-trivial quaternary Hamming code.
Symmetries
The two quaternary codes have the same symmetries, which include:
-
Order-5 cyclic permutation.
-
Order-3 multiplication by w.
-
The order-4 operation that maps
a b c d etob² e² c² a² d². -
Order-2 reflection (reversal), equivalent to repeating the order-4 operation twice.
You can choose where to map any two elements, and complete a permutation symmetry, possibly taking the conjugate.
You can choose any two digits freely and complete a word of the [5, 2, 4] code. You can choose any three digits freely and complete a word of the [5, 3, 3] code.
The codes are not preserved by *, nor by other permutations of the elements.
Summary
Starting from the Petersen graph, or alternatively from the regular dodecahedron, we have constructed an error-correcting code which is as symmetricalal as those objects. In particular, it has order-5 symmetry and its length is a multiple of 5. The Petersen code itself turned out to be suboptimal, but by improving it we found some promising codes with even higher symmetry (including order-15 cyclic symmetry):
- [15, 4, 8] simplex code
- [15, 6, 6] even-weight code
- [15, 7, 5] BCH code
- their duals, and more.
We also related them to some promising quaternary codes:
- [5, 2, 4] simplex code
- [5, 3, 3] Hamming code.
I'm confident that the code I need for my application will be one of these.
Ackowledgements
Thanks to Reuben Thomas for proof-reading and other improvements.
Last updated 2026/06/27