maximal length pseudo random sequences in PHP

Whilst waiting for the survey to finish, I’ve been doing some work on maximum length sequences.
First what is a maximal length sequence? In practice they are an apparently random set of numbers all with an equal number of bits, n, such that the length of the sequence is 2^n or often 2^n – 1 (as sometimes one number such as all zeros might never change.)
So, let’s use  a simple example where n is 2 so we get a two bit sequence with numbers 0 to 3 .The simplest sequence is that created by the rule that when we get to the maximum number we return to zero as in 0,1,2,3,0,1,2,3,0,1,2,3, etc. . But in general, we can specify any sequence by stating the rules underlying the sequence. If for example we have the following transformations:

00->10, 01 -> 11, 10 -> 01, and 11 ->00

we get the sequence:

00, 10, 01, 11, 00, 10, etc.

If however we had the rules

00->10, 01 -> 11, 10 -> 11, and 11 ->0

we get the sequence:

00, 10, 11, 00

which only has three rather than four numbers. So this set of rules would not be a maximal length sequence. (but note if we start with 01)

01, 11, 00, 10, 11, 00

We start with a sequence of four which then gets “sidelined” into a sequence of three. The problem is that two “lines” go into one value (11), so that the full length sequence is only possible from one starting number and will never repeat. So, a necessary requirement for a maximal length sequence is that each number in the sequence is uniquely transformed into one outcome.
But even 1-2-1 correspondence does not guarantee maximal length, as there are 1-2-1 rules that create several entirely different sequences.

00->10, 01 -> 11, 10 -> 00, and 11 ->01

This produces the sequences:

& 01,11,01,11,01, etc.

So, each number goes to one and only one unique new number in the sequence, but there are in this case two entirely different and “isolated” sequences. This is why it is difficult to produce these sequences, because it is all too easy to create a “sideline” or many isolated loops and with a 24 bit number there are 16million steps which have to be checked.
So, can we produce such a sequence in practice?
One place to start is with sequences that are known to work. One such sequence is the one used for CRC checksums. For those interested, the following is a coding procedure based on the CRC24 code (but using numbers not characters) written in PHP.

define ('CRC24_INIT', 0xB704CE);
define ('CRC24_POLY', 0x1864CFB);
function crc24($int)
	$crc = CRC24_INIT ^ $int;
	for ($i = 0; $i < 8; $i++)
		$crc = $crc << 1;
		if ($crc & 0x1000000)
			$crc = $crc ^ CRC24_POLY;
	return $crc & 0xFFFFFF;

The detail is that first the value ($int) is XORed with a number (CRC24_INIT). This is itself a unique 1-2-1 transformation. Then the number is shifted and if it overflows to the 25th bit (0x1000000)  the crc value is XORed with a number known to give a maximal length sequence (there are relatively few such numbers and I suspect they are found by trial and error).
Why would anyone want this?
There’s the obvious use of creating a code – that is, if you wanted to send numbers by code, you could tell the receiver the transformation you intend to use, and so long as they have the reverse transformation (in the above CRC example, shift the other way), then you could send any number over an unsecured network, and even if it were intercepted, without knowing the transformation, it would be impossible to know what the number was.  This is in effect what Enigma was – however, obviously it failed to be secure. But why?
The reason is that any coding regime tends to use a particular method to create the sequence. So, e.g. the CRC method of creating a random sequence (and used to checksum files), uses a transformation known to produce a single one-to-one correspondence between starting and ending condition so that if e.g. 32 bits are used there are 2^32 (or perhaps less 1) unique numbers each with a unique resulting number after the transformation.
But this would be useless as a code, because the sequence is well known and even if only part of the message were received, it would be obvious where that number is in the sequence the coding algorithm had reached and therefore an almost trivial exercise in working out what number ($int) had been added (XORed) to the sequence to get the new code number. Or to put it another way, if there is one common number (a space) and the code for this is worked out, then this one number would unlock all the others because they all have a single relationship to each other through the sequence.
So, because I wanted such a sequence, I sat down to work out a way to obfuscate the sequence. There are two obvious places to do this, the first was to change: CRC24_POLY, however whilst this would change the sequence, it would almost certainly result in a non-maximal length sequence as most numbers produce a number of shorter sequences. So the other avenue of attack was to change CRC24_INIT.
But I wasn’t content with this, because clearly this is such an obvious thing to do, that anyone would do the same. So, after a while I came up with a new solution which involved several numbers and several steps. This is in effect having several different “wheels” to the enigma machine. Effectively the code “dialed” in a number on the wheels, and this created a unique sequence (at least for my code – not sure about Enigma).
But as I needed a maximal length sequence, how could I be certain it was a maximal length sequence? Fortunately, the answer was simple. Just run the sequence taking the output and using it as the new input, waiting for the same number to come up again and see how many iterations were required.
Fantastic – job well done!
But then I began thinking … I’ve created a code that has about 2^100 different sequences (or at least that is how much information goes into changing the sequences and there is a unique input to output correspondence, so I believe the total is around 2^100). But then I began thinking “wouldn’t it be nice to make a general routine that could make any of the potential sequences.”
So, unaware how I might do this, I thought I would first work out how many unique sequences are there of all the numbers from 0 to 2^n-1 as this would give me an indication of how complex the procedure would need to be. The answer for a sequence of N numbers is obviously N! so the total sequences for an n bit number is (2^n)! . This is too large for a calculator, but it can be approximated using Stirling’s formula to:

exp (xln(x) – x)

or for n=24 the answer is:

= exp (2^24 ln (2^24) – 2^24)

= exp (262,320,703)

262 million is again too big for my computer calculator to calculate, but is about:

2^ (~378million)

So, it turns out that any procedure would need numbers with a total of around 400million bits in order to be able to uniquely specify each and every possible sequence. That is only 6%  less than the 402 million bits needed to hold an array of all numbers in the sequence of all unique 24 bit numbers. So, even if I could work out a procedure that used the least information to uniquely specify a sequence, it would be a lot simpler just to have a table of numbers so that e.g. the nth value in the array containing a value m, would mean that the input n->m.
So, whereas I thought I had been very clever in producing a procedure with 2^100 possible combinations, this “space” occupied by my potential sequences was only 100/400million of the possible sequences. That is 0.000025% of the potential sequences.
In other words, rather than a half dozen or at most a dozen numbers, I needed 12 million 32 bit numbers as parameters for my procedure. I had not been “clever” in the sense, I had not even dipped my toe into the possible extent of complexity within even this very simple sequence. The difference between my perception of how complex this system was, and the actual complexity … was not as I initially thought “orders of magnitude” (i.e. 10x 100x) but I needed a new concept which I suppose I could call “boggles” of magnitude (10^10), and this for only a very simple sequence of 24 bits. What if it had been 32 or even higher?
Which reminds me of a question I asked myself long ago … how complex would a code need to be such that if we did not have the code that e.g. a string of English language characters length N, could be decoded in so many possible ways whilst trying to search for the key using a “brute” force attack, that there would very likely be many that were perfectly formed English language sentences?
So, e.g. if one were to be using such a code, one would deliberately mispel a word, so that any automated brute force attack would assume the sentence had been incorrectly decoded and ignore the (correct) but mal-formed English.

This entry was posted in Climate. Bookmark the permalink.

7 Responses to maximal length pseudo random sequences in PHP

  1. David Ross says:

    But a program could easily include common (or even all conceivable) misspellings as part of its brute force dictionary.
    A tougher nut to crack would be the deliberate use of obscure idiom, metaphor and cultural references. Add that to use of an obscure language or dialect and the eavesdropper’s problems mount.

    • After thinking about it, I began to wonder whether encouraging misspellings might make the message more distinctive, add information and make it easier to decode. Likewise, obscure idiom being obscure is far more unique. So for example:
      the “cat sat on the mat” … contains 6 short words each one of 8000ish 3letter words.
      But: “the feline was seated upon the carpet” has many longer words which are therefore far less common by random chance. So, I presume you need a larger range of possible encoding mechanisms to be able to produce such sequences by chance.

      • David Ross says:

        I’ll encode your example.
        Hey Matt, I heard Dick Whittington’s friend did the same thing to you as Humpty did before his accident. LOL
        I could have said “Tom, you know, Jerry’s friend” instead of Dick Whittington’s friend.
        Unless you knew the cultural references you wouldn’t have a clue what the hidden meaning was –the cat sat on the mat. I’m sure COMINT guys in Afghanistan have similar problems decoding messages obscured with Pashtun idiom and culture.
        I remember reading about the FBI tapping the phone of the drug dealer Howard Marks. A British cop came to give them a hand. He soon spotted the significance of a statement made in a call to Marks some time before. The caller told Marks “You’re dog’s sick.” It was Cockney rhyming slang and the caller was telling Marks that his phone was tapped, without alerting the American eavesdroppers.

        • bp08c 9bn9p eseb6 cepaj 1jseo 31bfd 4midi
          645c7 70b2v 5is3a 2551l ni00 dan4b fo05u
          8fp71 ate6k a75qh 8lir4 f2bpr dgsoe dps5
          2vetg 58nvf 7q0uq 6ueg2 4cphn 3r0j8 19nit
          ckimm e65n3 9hsls b3bk9 85bdm ansc3 d05es
          fiif9 2fnb2 t0an 7ap88 5oe9t 4s075 6en6g
          1pe4f 3bp5q ems1h c4b04 bji2r 9153e 9revb
          b9puu cu0s1 ecntk 3hipv 135oa 6ksql 46br0
          525lo 7gikd 7bmi 2lsn7 f8pjc dqeip adng6
          8v0hj 3gqkg 12dl5 6lknq 473mf 9q6i4 b8hjh

        • I wonder which code the FBI can break first?

  2. I’m not sure why but this blog is loading very slow for me. Is anyone else having this problem or is it a problem on my end? I’ll check back later and see if the problem still exists.
    pink uggs sheep

    • scottishsceptic says:

      Sorry, unfortunately it just seems to be the way things are going.
      Part of the explanation is here: A Weird Coincidence – or Black ops?.
      The web server I use seems to have a lot of problems and I’m never sure whether this is targeted at me or worse, is targeted at him by one of the big server companies trying to get rid of another independent. Either way I’d prefer to support a smaller local company that some massive “IPCC” type server company.

Comments are closed.