6. Puzzle Series [#3: De Bruijn sequence]

I will discuss puzzle #3 here. Let me repeat the puzzle. You want to write a string $S$ of digits with minimum length, say $L= {\rm length}(S)$, to include all the $n$-digit numbers in base $b$. For example, if $n=3$ and $b=10$, then, we want $S$ to include 000, 001, …, 999 in it, as its substrings.

Proposition 6.1. A string $S$ with length $L=b^n+n-1$ digits, can include all the $n$-digit numbers in base $b$, as its substrings. This formula means that each digit you are appending, after first $n-1$ digits, will include a new substring, i.e. a distinct $n$-digit number. The proof will be a bit sketchy, but I think it is quite straightforward to make it rigorous.

Fig. 6.1) The graph of two-digit states and transitions. This is a De Brujin graph.

Proof. An easy example is 2-digit numbers in base 2. The solution, which you can find by trying combinations, is S=”00110″ which has L=5 and includes all 00, 01, 11, 10 as its substrings. There are few other S that will be the answer, S=”01100″, S=”11001″, S=”10011″. To make it a bit more interesting let’s try to write all 3-digit numbers, n=3, in base b=2. Let me begin with 00, i.e. S=”00…”. There are always two possibilities 0 and 1. Let’s think for any 2-digit substring as an state, and ask what are the next/previous states can be. A good way of visualizing this is drawing a graph as shown in fig. 6.1. If the string S is a path in this graph, choosing 1 or 0, then each edge is a distinct 3-digit number. So to be able to write down a string which includes all the 3-digit numbers and only once each, we need to find an Eulerian cycle in this directed graph. This is possible here if and only if each vertex has equal in degree and out degree and all vertices belong to a single strongly connected component. It is easy to see that each vertex has equal in degree and out degree. Clearly, the in degree and out degree both are two. You can put put 0 or 1 after or before a number. Also this graph has only one strongly connected component, because we assumed that we are writing all the possible states, so putting a 0 or 1 down you will end up at some included state, and you will get to a state by writing down all its digits. So we proved that there is an Eulerian cycle or a string S that includes all the 3-digit numbers only once as its substrings. The proof that for any $n$ and $b$ we have Eulerian cycle is similar. So to write S, we can start from a vertex, write down $n-1$ digits, followed by an Eulerian cycle which has the length of $b^n$, in other words $L=b^n+n-1$. ∎

Next question one might ask is, how many possible ways one can write this $S$ string? To  check my answer for this question, I looked at the literature. There is an algorithm called BEST algorithm that we can use to count the number of Eulerian cycles in any graph, in polynomial time. But I found out this specific example studied by De Brujin and is called De Brujin sequence. Seems like the speculation on this sequence helped to come up with BEST algorithm.

De Brujin sequence is cyclic. It is more natural to look at our problem in a cyclic way, because we are talking about Eulerian cycles, and who cares where one starts writing the sequence. We know the length of the cycles, $b^n$, so one can find number of possible distinct $S$ sequences, in terms of possible distinct De Brujin sequences as, $|S(b,n)| = b^n |B(b,n)|$, where I used $|.|$ notation as number of possible distinct sequences. Again, it does not matter where you start that specific cycle to write down the $S$. So let’s use the cyclic $B(b,n)$ from now on as it is a more natural object. Whenever you wanna generate a string $S(b,n)$, you just attach the first $n-1$ digits of $B(b,n)$ to the end of $B(b,n)$. You can also cut $B(b,n)$ at any point to start the $S(b,n)$ from. Using mathematical terms, we are partitioning the set of all possible strings $\{S(b,n)\}$ to the classes $[B(b,n)]$ by the equivalence relation that two strings are equivalent if you can get one from the other using string rotation operation.

These are some basic interesting questions one might ask about $B(b,n)$,
i. How many possible distinct $B(b,n)$ we can have in general?
ii. Write down an algorithm to produce a $B(b,n)$ sequence.
iii. Find an algorithm that writes all the possible $B(b,n)$.
I will extend this post to talk about these three questions in detail.

Additional thoughts, links, and references:

[6.1] I will have a post on puzzle #2 after I learn 3d visualization using Sage or IPython. I am even considering Blender 3D. If you have an idea and can help me to do this, let’s do it together. The math part is easy, we can discuss it. I actually put the puzzle there to push myself to learn how to use a software to produce projections of 3d or higher objects on 2d screen. I think this is a valuable skill for anyone who is interested in doing amateur or even professional math. Also really useful for educational purposes.
[6.2] As you might have guessed, one can extend this sequence to higher dimensions to get a n-dimensional torus cycle, instead of a circle cycle. The simplest case after the sequence is the De Brujin torus (aka De Brujin array). Apparently, the question that if a $m \times n$ De Brujin array exists, for any given $m$ and $n$, is still open. We know that the “square” ones exist ($m = n$ and even), and of course as we saw in this post, the ones with $n=1$ (De Brujin sequences). One can step further and ask the bigger question, for a d-dimensional De Brujin torus, i.e. if a $n_1 \times n_2 \times … n_d$, for a given set of $n_1, \dots, n_d$, exists.
[6.3] There is a nice simple card trick Simon Willerton wrote about here in the n-Category Cafè.

Leave a Reply

Your email address will not be published.