ONE-TIME PAD CRYPTOGRAPHY
Frank Rubin
January 13, 1997
Abstract:  
This paper shows how a moderate amount of random key stream
can be used to generate a lifetime supply of keys for one-time pads.
It further shows how arbitrarily many parties can correspond using
the same random key, without compromising one another's communications.
The net effect is to make the unbreakable one-time pad practical
for widespread use.
      
There are two eternal verities in cryptography: the only
absolutely secure cipher is the one-time pad, and the one-time pad
cannot be achieved in practice.
The goal of this paper is to show how the one-time pad can be
accomplished.
      
The term "one-time pad" refers to any method of encryption
where each byte of the plaintext is enciphered using one byte of the
key stream, and each key byte is used one time then never used again.
The key stream for a one-time pad must be a true-random stream, meaning
that every key byte can take any of the values 0 to 255 with equal
likelihood, and independently of the values of all other key bytes.
By contrast, in a pseudo-random key stream the value of each byte after
the first several is mathematically derived from the values of a few
preceding bytes.
      
The practical difficulty of using a one-time pad is that
the key bytes cannot be reused.
This means that even for a two-way exchange of messages, each party must
have a sufficient supply of key material on hand so that they cannot run out
before more key stream can be furnished.
For N-way exchanges, the amount of key required increases quadratically.
      
Many papers on cryptography assume that the only possible way out
is for the key to be generated by some mathematical algorithm, and then
expend a great deal of cleverness in finding an algorithm that adequately
approximates the properties of a true-random key.
The paper [1] gives a broad survey of such methods.
The weakness of such methods is that determining a portion of the key
stream will allow an opponent to reconstruct the entire key stream using
the same mathematical algorithm.
      
This paper presents a solution intermediate between
a pure random key and a pure mathematical generator.
ONE-TIME PAD
      
One-time pad encipherment can be denoted as
      Ci = E (Pi, Ki)
where E is the enciphering operation, Pi is the i-th
character of the plaintext, Ki is the i-th byte of the key
used for this particular message, and Ci is the i-th character
of the resulting ciphertext.
Both the key stream K and the enciphering operation E are secret.
The key for each individual message is the starting location in the
entire random key stream used for this encipherment.
      
For efficiency, it is good practice to start each message near the
position following the key byte used for the last character
of the previous message.
This eliminates the need to keep track of which portions have been used,
and removes the danger that a message will be longer than any of
the remaining segments.
      
Although the term byte has been used, and will continue to
be used for each unit of the key stream, it is not necessary that
each unit of the key be 8 bits long.
In fact, a key with larger units gives protection against a known-plaintext
attack, since then there may be several values of the key which produce
the given ciphertext byte from the known plaintext byte.
Similarly, the plaintext could be enciphered in units other
than 8-bit bytes.
For present purposes, however, encipherment based on bytes is
sufficiently general.
      
In principle, the encipherment E can be achieved by simple look-up in a
256·2k-byte table, where k is the size of each key unit.
Each column of the table, corresponding to one value of Ki
is a permutation of the values 0 to 255.
This makes it possible for the receiver to decipher the message uniquely.
The cipher will be most secure if each of these 2k permutations is
statistically independent of the others and of the identity permutation,
and if the enciphering operation is implemented so that the waveforms
of the bits of Ci give no clue as to the values of Pi and Ki.
      
The reason that no portion of the key stream may be reused is that an
opponent can detect reuse by the simple Index of Coincidence statistical test.
Once the opponent finds two or more portions of messages enciphered with
the same part of the key, it becomes possible to decipher those portions.
KEY ENHANCEMENT
      
Suppose that two parties wish to correspond in secret
over potentially compromised channels.
They could communicate using a one-time pad, as above, provided that
they could obtain a sufficient supply of key stream bytes, or that they
could resupply themselves with key bytes at a rate fast enough to
prevent ever running out.
Note, however, that the total amount of key used over time equals
the total amount of message material.
One byte of key for each byte of message.
So, if they had a secure way of resupplying the key, perhaps they could send
their messages this way, and skip encryption altogether.
      
Suppose, then, that the parties do not have an endless supply of key
bytes, but a limited supply, at least enough for the longest possible
message, but perhaps only enough for one day or even one hour of communication.
The question is: how can their supply be stretched to last a week,
or a year, or until new key material can be distributed?
      
There are many ways to generate a new sequence from a given sequence.
Indeed, this is precisely the problem that cryptography addresses.
Even such a simple expedient as adding a constant value to each key byte
Ki (the Caesar cipher) could be effective if the enciphering function E
is strong enough.
It is safer, though, to choose a method of enhancing the key which
does not depend on either the strength or the secrecy of E.
One such method is the autokey.
AUTOKEY
      
Autokey is a very simple and fast method of encipherment.
It has the valuable advantage that each byte of the new key stream has no
correlation or fixed relationship with the corresponding byte
in the original key stream.
Let A and B be two independent simple substitutions.
The new key stream L is given by
    L1 = A(K1) + B(Kn),
    Li = A(Ki) + B(Li-1)     for i = 2, 3, ..., n,
where the addition is modulo 256, and n is the length of the old key stream.
      
Greater strength can be obtained by repeating the autokey two
or more times in order to "cover its tracks."
This assures that in the final iteration none of the original bytes of the
K key stream participate.
It is important that the two substitutions A and B are strongly non-linear
in the sense that there is no a priori correlation between
any subsets of the bits of x and A(x) or B(x).
      
Since this autokey can be inverted, or reversed, the resulting
L key stream will also be true-random.
If it were not, then the inversion process would be a deterministic
algorithm for producing true-random numbers from nonrandom numbers,
which is impossible.
By definition, random numbers generated by a deterministic algorithm
are pseudo-random.
      
There must be sufficiently many choices of A and B to prevent the
correspondents from ever running out of keys.
If A and B are determined by permuting the values 0 to 255 using a
pseudo-random number generator, and this generator has a 40-bit seed,
then there are 280 possible L key streams.
Even if each L stream were used for only one second of communications
this supply would last indefinitely.
Indeed, if each new key were chosen randomly, rather than systematically,
the chance of ever choosing the same L stream twice is negligible.
      
A new key stream may be generated in bulk fashion using the entire
K key stream, or it may be generated as needed using just the portion
of the key stream required for the next message.
For very long messages, the autokey can be applied to one block of the K stream
at a time as it is retrieved from secondary storage, provided that
message keys are not restricted to block boundaries.
EXHAUSTIVE TRIAL
      
Since the K key stream is true-random, no method of cryptanalyzing L
to recover K would work except exhaustive trial.
In order to decide whether 280 possible choices of A and B are adequate
for security, any known or potential exhaustive trial attacks should be
examined.
      
Assume that an opponent has a copy of the encryption device or
software, and therefore knows the E operation.
Suppose, further, that the opponent knows both the ciphertext and plaintext
for some given message.
This means that the E encipherment can be stripped off, yielding a
portion of the L key stream.
It is not feasible for an opponent to try all 280 combinations for
the two substitutions, and even if that were possible, the K stream
is true-random, and there is no way to distinguish the correct value.
      
Imagine that the messages sent today will be of value
20 or even 30 years hence, when it might be possible to try 280 keys.
The opponent would still need to have two different messages known to be
enciphered with the same part of the key stream, and compare the 280
possible K streams for each one.
A sample of 12 bytes from each would be sufficient.
However, this requires storage on the order of 1025 bytes,
which seems well beyond what can be projected even 30 years ahead.
The autokey, therefore, seems secure against any conceivable attack
that could be mounted in the foreseeable future.
Of course the key size should be reviewed periodically as computers
get larger and faster.
      
The autokey method allows a relatively small amount of true-random
key material to stretch a long, long way.
Hamburger helper.
      
One warning is necessary: the one-time pad method still depends
on the opponent not knowing where in the key stream the key for a
specific message begins.
The key stream needs to be long enough to make this non-trivial.
If P is the peak amount of message material expected during the time
period when an L key is in use, Q is the maximum length of any message,
R is the maximum number of messages expected during the period, and
G the average gap left between messages in the key stream,
then a good rule of thumb would be to make the key stream at least
P+Q+RG+106 bytes long.
N-WAY COMMUNICATION
      
This technique can be extended to N-way communications by using
a layered approach.
Suppose that there are N parties (people or stations) that wish to communicate,
and that all possible pairs of parties may potentially need to exchange
messages at some time.
      
It is not sufficiently secure simply to supply all of the participants
with the same true-random key stream, and let them apply the autokey method
above to generate more key material.
Two different parties may use exactly the same key and autokey,
thus creating two messages with the same effective key.
Such messages will begin appearing when the number of L streams that have
been used reaches about 1.4x240.
It is considered feasible to record and store 1.4x240 messages.
An opponent monitoring all pairs of correspondents may detect this,
and recover part of the effective key, that is, the L key.
This portion might solve messages between these or other pairs.
      
One solution is to control carefully which keys each pair of
correspondents use to generate their L streams.
This poses a huge administrative burden if the number of users is
high, especially if users frequently join and leave the system.
Worse, if the range of keys alloted to a particular user pair
becomes known, then exhaustive trial of that limited set may be
feasible, perhaps even easy.
      
A better solution is to use a layered approach.
Each pair of correspondents could have a unique autokey used to produce
their L key stream, then they would apply a second autokey to produce the
key stream for a specific time period.
This way, neither the K key stream itself nor any derived L key stream
is ever used as the key of a message.
      
If the K stream were discovered by espionage, that would not
compromise the communications of any pair.
Similarly, no individual with legitimate access to the K stream could
eavesdrop on any of the other parties.
      
If the L stream for a particular pair of parties, or even for several
pairs were discovered by espionage,
that would not compromise the communications of any other pair,
because their L stream would not be known.
Futhermore, it would not compromise the communications of the affected pairs
because it would not reveal their key streams for any given time period.
      
Using this layered approach, a small amount of random key can
serve a large number of correspondents over an indefinite period.
Loaves and fishes.
In fact, there are so many choices for the A and B substitutions that
the key can never be exhausted, no matter how many parties
join the network.
Burning bush.
CONCLUSIONS
      
This paper has shown a practical method for implementing
one-time pad cryptography.
Starting with a limited supply of a true-random key, a simple autokey
cipher is used to generate additional random key material indefinitely.
Using a layered approach, the same random key can supply any number of
correspondents indefinitely with message keys for use in a one-time pad.
A break in the communications of one or more pairs of correspondents
will not compromise the security of any other pairs.
REFERENCE
1
Ritter, Terry 1991. The Efficient Generation of Cryptographic Confusion
Sequences. Cryptologia 15: 81-139.