maximal length pseudo random sequences in PHP

Whilst waiting for the survey to finish, I’ve been doing some work on maximum length sequences.
First what is a maximal length sequence? In practice they are an apparently random set of numbers all with an equal number of bits, n, such that the length of the sequence is 2^n or often 2^n – 1 (as sometimes one number such as all zeros might never change.)
So, let’s use  a simple example where n is 2 so we get a two bit sequence with numbers 0 to 3 .The simplest sequence is that created by the rule that when we get to the maximum number we return to zero as in 0,1,2,3,0,1,2,3,0,1,2,3, etc. . But in general, we can specify any sequence by stating the rules underlying the sequence. If for example we have the following transformations:

00->10, 01 -> 11, 10 -> 01, and 11 ->00

we get the sequence:

00, 10, 01, 11, 00, 10, etc.

If however we had the rules

00->10, 01 -> 11, 10 -> 11, and 11 ->0

we get the sequence:

00, 10, 11, 00

which only has three rather than four numbers. So this set of rules would not be a maximal length sequence. (but note if we start with 01)

01, 11, 00, 10, 11, 00

We start with a sequence of four which then gets “sidelined” into a sequence of three. The problem is that two “lines” go into one value (11), so that the full length sequence is only possible from one starting number and will never repeat. So, a necessary requirement for a maximal length sequence is that each number in the sequence is uniquely transformed into one outcome.
But even 1-2-1 correspondence does not guarantee maximal length, as there are 1-2-1 rules that create several entirely different sequences.

00->10, 01 -> 11, 10 -> 00, and 11 ->01

This produces the sequences:

& 01,11,01,11,01, etc.

So, each number goes to one and only one unique new number in the sequence, but there are in this case two entirely different and “isolated” sequences. This is why it is difficult to produce these sequences, because it is all too easy to create a “sideline” or many isolated loops and with a 24 bit number there are 16million steps which have to be checked.
So, can we produce such a sequence in practice?
One place to start is with sequences that are known to work. One such sequence is the one used for CRC checksums. For those interested, the following is a coding procedure based on the CRC24 code (but using numbers not characters) written in PHP.

define ('CRC24_INIT', 0xB704CE);
define ('CRC24_POLY', 0x1864CFB);
function crc24($int)
	$crc = CRC24_INIT ^ $int;
	for ($i = 0; $i < 8; $i++)
		$crc = $crc << 1;
		if ($crc & 0x1000000)
			$crc = $crc ^ CRC24_POLY;
	return $crc & 0xFFFFFF;

The detail is that first the value ($int) is XORed with a number (CRC24_INIT). This is itself a unique 1-2-1 transformation. Then the number is shifted and if it overflows to the 25th bit (0x1000000)  the crc value is XORed with a number known to give a maximal length sequence (there are relatively few such numbers and I suspect they are found by trial and error).
Why would anyone want this?
There’s the obvious use of creating a code – that is, if you wanted to send numbers by code, you could tell the receiver the transformation you intend to use, and so long as they have the reverse transformation (in the above CRC example, shift the other way), then you could send any number over an unsecured network, and even if it were intercepted, without knowing the transformation, it would be impossible to know what the number was.  This is in effect what Enigma was – however, obviously it failed to be secure. But why?
The reason is that any coding regime tends to use a particular method to create the sequence. So, e.g. the CRC method of creating a random sequence (and used to checksum files), uses a transformation known to produce a single one-to-one correspondence between starting and ending condition so that if e.g. 32 bits are used there are 2^32 (or perhaps less 1) unique numbers each with a unique resulting number after the transformation.
But this would be useless as a code, because the sequence is well known and even if only part of the message were received, it would be obvious where that number is in the sequence the coding algorithm had reached and therefore an almost trivial exercise in working out what number ($int) had been added (XORed) to the sequence to get the new code number. Or to put it another way, if there is one common number (a space) and the code for this is worked out, then this one number would unlock all the others because they all have a single relationship to each other through the sequence.
So, because I wanted such a sequence, I sat down to work out a way to obfuscate the sequence. There are two obvious places to do this, the first was to change: CRC24_POLY, however whilst this would change the sequence, it would almost certainly result in a non-maximal length sequence as most numbers produce a number of shorter sequences. So the other avenue of attack was to change CRC24_INIT.
But I wasn’t content with this, because clearly this is such an obvious thing to do, that anyone would do the same. So, after a while I came up with a new solution which involved several numbers and several steps. This is in effect having several different “wheels” to the enigma machine. Effectively the code “dialed” in a number on the wheels, and this created a unique sequence (at least for my code – not sure about Enigma).
But as I needed a maximal length sequence, how could I be certain it was a maximal length sequence? Fortunately, the answer was simple. Just run the sequence taking the output and using it as the new input, waiting for the same number to come up again and see how many iterations were required.
Fantastic – job well done!
But then I began thinking … I’ve created a code that has about 2^100 different sequences (or at least that is how much information goes into changing the sequences and there is a unique input to output correspondence, so I believe the total is around 2^100). But then I began thinking “wouldn’t it be nice to make a general routine that could make any of the potential sequences.”
So, unaware how I might do this, I thought I would first work out how many unique sequences are there of all the numbers from 0 to 2^n-1 as this would give me an indication of how complex the procedure would need to be. The answer for a sequence of N numbers is obviously N! so the total sequences for an n bit number is (2^n)! . This is too large for a calculator, but it can be approximated using Stirling’s formula to:

exp (xln(x) – x)

or for n=24 the answer is:

= exp (2^24 ln (2^24) – 2^24)

= exp (262,320,703)

262 million is again too big for my computer calculator to calculate, but is about:

2^ (~378million)

So, it turns out that any procedure would need numbers with a total of around 400million bits in order to be able to uniquely specify each and every possible sequence. That is only 6%  less than the 402 million bits needed to hold an array of all numbers in the sequence of all unique 24 bit numbers. So, even if I could work out a procedure that used the least information to uniquely specify a sequence, it would be a lot simpler just to have a table of numbers so that e.g. the nth value in the array containing a value m, would mean that the input n->m.
So, whereas I thought I had been very clever in producing a procedure with 2^100 possible combinations, this “space” occupied by my potential sequences was only 100/400million of the possible sequences. That is 0.000025% of the potential sequences.
In other words, rather than a half dozen or at most a dozen numbers, I needed 12 million 32 bit numbers as parameters for my procedure. I had not been “clever” in the sense, I had not even dipped my toe into the possible extent of complexity within even this very simple sequence. The difference between my perception of how complex this system was, and the actual complexity … was not as I initially thought “orders of magnitude” (i.e. 10x 100x) but I needed a new concept which I suppose I could call “boggles” of magnitude (10^10), and this for only a very simple sequence of 24 bits. What if it had been 32 or even higher?
Which reminds me of a question I asked myself long ago … how complex would a code need to be such that if we did not have the code that e.g. a string of English language characters length N, could be decoded in so many possible ways whilst trying to search for the key using a “brute” force attack, that there would very likely be many that were perfectly formed English language sentences?
So, e.g. if one were to be using such a code, one would deliberately mispel a word, so that any automated brute force attack would assume the sentence had been incorrectly decoded and ignore the (correct) but mal-formed English.

This entry was posted in Climate. Bookmark the permalink.