BigCalm : Visual Basic : RijnDael Block Cipher

Notes on the RijnDael Code



Download Source Code



Do not use any part of the code except for the raw algorithm module. Everything else has already been superceded and it isn't wonderfully well written anyway



It may surprise you to know that this is my first attempt at writing an encryption algorithm. I’d like to suggest to anyone else who wants to learn about encryption that you don’t try and do what I did, which is try and implement one of the most advanced form of encryption available today. My brain still hurts even now…

My code is a translation of Paulo Barreto’s (and others) optimised C code which can be downloaded here. Credit must also go to Peter Raddatz for his Binworks library, which allows fast usage of bit-shifting from Visual Basic. I’d normally complain about Microsoft’s lack of functionality here, but New Year’s resolutions prevent me. I believe that at the time of writing, (09/01/01) that this is the only implementation of RijnDael in Visual Basic.

I do tend to repeat myself quite a bit in this document - I apologise to anyone reading it in advance!

 

The Algorithm

The RijnDael (pronounced Reign Dahl) algorithm was adopted in October 2000 as the Advanced Encryption System (AES) by the American National Institute of Standards and Technology (NIST), and is soon to become the Federal Information Processing Standard (FIPS). This algorithm is a successor to what is currently used - the Data Encryption Standard (DES) which has proved to be crack-able, given enough computing resources.

The RijnDael algorithm was developed by two people - Joan Daemen and Vincent Rijmen, both experts in the cryptographic community. They own the trademark on it’s name too.

 

General Encryption

Encryption algorithms have been around for thousands of years. Julius Caeser is the first known user of an encryption system, known as Caeser’s keys. Caeser wanted to send messages, but he didn’t trust the messengers, so what he did was use alphabetic substitution in his messages - replacing "A" with "D", "B" with "E", "C with "F", etc. The recipient of the message would have been told in the past the secret of the encryption - the "shift by three" method, but the messenger, who had not been told, would not be able to work out the secret of the message.

If we were to encrypt a sentence of English using Caeser’s keys, we can almost immediately see that "patterns" can be spotted easily.

e.g.

Plain Text: THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG

Cipher Text: WKH TXLFN EURZQ IRA MXPSHG RYHU WKH ODCB GRJ

A person who tries to "break" a cipher is known as a crypt-analyst. A crypt-analyst would first notice that some letters occur more frequently than others - there are 4 ‘H’ letters, and 5 ‘R’s. A crypt-analyst can deduce that these must be vowels. With a few educated guesses with knowledge of the English language, it is fairly simple to deduce that "WKH" is "THE", and from there, it wouldn’t take too long to work out the "shift by three" rule. Caeser could have used a different "shift" - try this example to see if you can work out what it says - I have used the shift by "n" rule, where all you have to do is work out "n".

Cipher Text: SGHR SDWS HR DMBQXOSDC AX SZJHMF NMD KDSSDQ ZVZX EQNL SGD NQHFHMZK KDSSDQ ZMC XNT RGNTKCMS GZUD Z OQNAKDL VNQJHMF NTS VGZS HS RZXR

It shouldn’t take you more than five minutes to work out what "n" is, so I won’t spoil it by telling you the answer.

Obviously, this method of encryption is not wonderfully secure. Though encryption has been around for thousands of years, only in the last century have we seen great advancement in the field of cryptography. What would be better than Caeser’s keys is a method where we encrypt each character seperately, and we encrypt it differently each time.

We could use a variation of Caeser’s keys, which changes "n" every time a letter is used - for example, let’s increase "n" each time by one after encrypting each letter for this sentence, where "n" starts at three.

Plain Text: THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG

Cipher Text: WLJ WBQLU MDBKC VEP COHLBB NVFT WLJ RHHH NZD

The cipher text is much harder to work out now (I must apologise - the two encrypted "THE" symbols ended up the same - this would not normally be the case, but in this example the two "THE" words were 26-n characters apart).

One method of solving this is to try for all possible combinations of "n". Any crypt-analyst should be able to find the solution with a maximum of 26 attempts.

The key in the above example is "n" - the number of shifts to be done to the initial letter. This is not very strong encryption, allowing a brute force attack to reach the solution with a maximum of 26 attempts.

Now, you might say, "Why don’t you keep the way encryption is done a secret?". This would stop any crypt-analyst trying to decrypt the text, because there are an infinite number of ways it could be encrypted. Unfortunately, this isn’t possible - methods for cryptography are published in the public domain - collaborators must be using the same method for encryption and decryption. Hackers can backwardly-compile encryption code, and therefore derive the algorithm. It would only have taken one of Caeser’s trusted acquaintances to tell how he was encrypting for the method to become totally useless. In modern cryptography it is possible to have a method for encryption which is published world-wide, that is completely secure, as long as the Key is not known. In fact, a crypt-analyst (or hacker) is assumed to have a large amount of information available to him to try to break the key - he is assumed to have

  1. Almost limitless supplies of state-of-the-art computing technology
  2. Knowledge of the algorithm being used.
  3. Many examples where he has both the Plain Text (unencrypted) and the Cipher Text (encrypted) to compare.

Modern cryptographical techniques can cope with the possibility of all of these, and still be reasonably certain that a key, if properly constructed, cannot be broken.

During World War II, the Nazis used an encryption method very similar to that above. It was a lot more complicated. Apart from other complications such as the plug board, it had linked wheels. The wheels would translate each letter as it was keyed in, and then move along a number of notches. The number of notches moved was dependent on the letter that was pressed. As long as the person receiving the message had the same starting position on their machine, they could be sure of decrypting the message. Outsiders, receiving just the encrypted message could not decipher the message unless they knew the starting position of the machine. This legendary machine was called the "Enigma" machine, and the breaking of its’ code gave a great advantage to the Allies and research work on it allowed Alan Turing to eventually construct what we now know as a "computer".

Even with a captured Enigma machine (i.e. Knowledge of the algorithm), it took many months before a method of deciphering an encrypted message was developed. The secret keys were the starting position of the wheels in the machine. A demonstration of how the enigma machine works can be found here

From the breaking of the Enigma code sprang the computer. Computers were able to calculate much faster than the people who worked on them - today an Enigma message could be broken within a few minutes on a normal PC. This has led to the flourishing of the cryptographic world in the latter part of the 20th century, in response to the need for new and secure methods of encryption. In the 1970’s, the Data Encryption Standard was adopted by the US Federal government. It has since become the de-facto standard for symmetric encryption throughout the world. There are many implementations of DES, and it provides reasonably secure way to handle information. Wait! What did I mean by "symmetric"? Symmetric encryption is where you encrypt and decrypt using the same method (a password encrypted file, for example). Asymmetric encryption is where you encrypt and decrypt using different methods (public/private encryption - sending credit cards over the Internet). I will not discuss Public and Private keys here, as RijnDael and DES are symmetric, and not appropriate for public/private key use.

All the simple methods of encryption discussed so far all encrypt a single character at a time - which is a distinct disadvantage when it comes to security - what would be better is to encrypt a specific number of characters at once - this will allow much stronger encryption to take place. Methods that do this sort of encryption are known as Block Ciphers, and are universally recognised as harder to attack than the other sort of encryption (Stream Ciphers) mentioned before. Both DES and AES are block ciphers, and these will be discussed in more depth in the next section.

Over the last few years, DES has started to show a few cracks - brute-force attacks on DES that were unthinkable in the 1970’s have now become possible because the speed of computers has increased dramatically. In response to this, the US Government decided it was time to move away from DES, and proposed an algorithm competition to see what was the best algorithm for encryption around today. The winner would become the Advanced Encryption Standard (AES), and be adopted throughout the Federal Government, and because of the large public usage of DES, the rest of the world too. After a long wait (it was a three year competition) and many attacks on all algorithms involved, the RijnDael algorithm was named in October 2000 as the Advanced Encryption Standard.

 

The Three Levels of a Block Cipher

What’s a block cipher? A block cipher encrypts a specific number of bits - not more or less.

As far as I can make out, there are three levels to a block cipher encryption process.

Level One: Algorithm. The pure unadulterated block cipher encryption algorithm. In this state, the algorithm is almost totally unusable by programmers. You can encrypt only a specific number of bits at any one time - not more or less. (This is why it’s called a "Block" cipher)

Level Two: Block to Stream conversion. To allow programmers to encrypt a different number of bits than that specified by the algorithm, the block cipher needs to be converted to a stream cipher, using a conversion method called a "cipher mode" or "mode of operation".

Level Three: Packaging. Once you have an encrypted Stream of data, you then need to decide what you want to do with it. You need to add recognisable headers and trailers containing information such as the algorithm used, the initialisation vector, the number of bits in the key, and of course, a method of checking that decryption has been attempted with the same key that was used to encrypt.

 

Technical Notes

Most of these notes are derived from email conversations with Paulo Barreto (the crypto-god!).

RijnDael and DES are known as "Block" ciphers. This means that they can only encrypt a certain number of bits. RijnDael encrypts blocks of 128 bits, but it is more convenient to think in terms of 16 bytes. If less bits are required to be encrypted, padding methods are used to fill the block up.

There are methods that allow streams of data to be converted into blocks for encryption, and vice-versa for decryption. These all have cryptographic basis, and the official modes are as follows:

ECB: Electronic Code Book

CBC: Cipher Block Chaining

CFB: Cipher Feedback

OFB: Output Feedback

Finally, common unofficial methods also exist such as

PCBC: Error Propagating Cipher Block Chaining.

CTR: Counter Mode (It is rumoured that NIST will recommend use of this Mode because parallel implementations of it can exist).

Padding notes.

The last block in any encrypted stream is always padded, and it can contain no valid data!

Certain cipher modes do not have to be padded (not sure how these work though…)

The last block contains at most 15 bytes! The rest is padding information. The last byte in the block always contains the amount of padding that was used.

Generally, the padding contains the number of pads used in that block.

Some examples, because it confused the hell out of me when I first looked at it.

Padded encryption of 7 bytes:

Block :FF FF FF FF FF FF FF

Is encrypted using padding, and when decrypted the result will be:

Block :FF FF FF FF FF FF FF 09 09 09 09 09 09 09 09 09

Now what I thought was, what if you try to encrypt 15 bytes? Is there a chance that you could end up with the same plain text decryption as 16 bytes with a last byte of 01?

Luckily, Paulo was on hand to correct me. Again, let me repeat myself, The last block in any encrypted stream is always padded and it can contain no valid data! That last block contains at most 15 bytes! The rest is padding information.

Padded Encryption of 15 bytes

Block :FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

Is encrypted using padding, and when decrypted the end result will be:

Block 1:FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 01

Padded Encryption of 16 bytes:

Block :FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 01

Is encrypted using padding, and when decrypted the end result will be:

Block 1:FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 01

Block 2:10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

Block 2 is translated as a fully padded block (16 bytes of padding).

This means that the last block and the last block only should have padding!

Headers and Trailer Information for each encryption (each block is 128 bits).

Block 1: Long 1: Endian Style (0 = Little, 1 = Big)

Long 2: Number of Bits in cipher key

Long 3: Cipher Mode (e.g. ECB)

Long 4: Implementation Version.

Block 2: Initialisation Vector

Block 3…n-1: Encrypted Data

Block n: Padded Encrypted Data.

Block n + 1: Message authentication code.

Paulo’s reasons for having this structure:

  1. Under some circumstances it may be awkward or impossible to know beforehand the number of bytes to be encrypted, so the data length block should be replaced by some end-of-data marker (usually available as part of the underlying communication system). This choice is especially attractive when you don't know where the code will be used, as it does not make any assumption on how the data will be fed to the encryptor.
  2. Putting the key itself may lead to a security flaw, namely if the key is reused (there are circumstances where key reuse is desirable or mandatory; that's why most modes use an initialisation vector). Therefore, a message authentication code is more indicated to check for a correct decryption. And it should be appended to the data for the same reason exposed in item 1 above. You could merely append the hash of the data (computed concurrently with the encryption) and then encrypt the hash as an extension of the data itself, though I would rather use something like HMAC instead, with a different key.
  3. The initialisation vector need not be encrypted, especially if some modes like CBC are used. This is because the chaining value for each block except the first one is known anyway (it's the preceding ciphertext block), so it doesn't make much sense to conceal only one of them -- the only data kept from an attacker would be the very first block, which uses to contain cryptographically-irrelevant information like magic numbers, fixed tags, etc.

 

Cryptologists talk a lot about "Plain Text" and "Cipher Text" - these aren’t really "Text" as we know it. In fact, Plain Text is the unencrypted message, and cipher text is the encrypted message. It’s that

simple.

 

The Sad Truth

Modern encryption is based on the theory that you can keep a piece of information (the key) completely and totally safe. However, the sad truth is that this is virtually impossible to do. Windows is a particularly bad offender when it comes to raw password-key security, but other operating systems have flaws too.

Let’s say I write a program where you have to type in a password to encrypt and decrypt a file. Where am I typing my text? In a text box? A programmer with a decent knowledge of the Windows API calls can actually steal information out of controls in another application (whether or not you’ve done it with a line of stars). Bang! Your password isn’t safe when typing it into a text box.

Where is the key information held? In memory? Windows is a particularly bad offender here, because there are many tricks which allow you to hack memory around, allowing you to access other program’s memory without permission. Bang! Your password has been compromised again.

Are you typing the information in? Windows API provides you with the ability to check for key presses, not just for your application, but any key presses that occur. Bang! Your password has been compromised a third time.

I’m not having a go at Windows deliberately - similar flaws in the past have been pointed out with Unix machines, VAX/VMS machines (the infamous loginout patch springs to mind), and various others.

You’re typing it in? What about the people, or surveillance cameras looking over your shoulder at what keys you press? Actually, I’ve seen somewhere in the encryption faq’s on www.faqs.org that it is actually possible to read information off a VDU screen using sophisticated radiation monitoring systems - you might not even need a camera!. Ouch. Yet again, the password-key has been compromised.

Are you typing your password over a remote connection? Is that connection guaranteed to be safe? (The answer to this question is always no). Again, security has been compromised.

Additionally, you can write a perfect encryption algorithm, but that isn’t going to stop the person who uses the algorithm giving away the key unintentionally. This applies to both programmers who use your algorithm, and the individual users who tell anyone who asks what their password is.

Choice of password is very important. Many cracking programs use dictionaries of words and people’s names to guess likely passwords - making the algorithm used much less secure than it should be.

 

Implementation

Almost as soon as I’d started writing this, I realised that Visual Basic is not the language to do encryption in. Just the lack of bit-shifting operations caused me a major headache (a big thank you to Peter Raddatz for use of his Binworks DLL), not to mention the complexity of the algorithm in general, and the implementation of the hashing table was a pain too. Also, there are easier things than translating optimised C code into Visual Basic when you haven’t a clue what the C code is trying to achieve (Quantum physics is simple by comparison).

My first version was disgusting (you think the second one is bad…). I have moved back to a Byte implementation of the algorithm because implementation of the Cipher modes was getting impossible using Longs. I have also removed most of the exposed procedures in the RijnDael class object, leaving you hopefully with a less confusing interface. And I’ve moved the body of the code into an ActiveX DLL. The Te() and Td() arrays can be calculated rather than declared (check out the RijnDael documentation or other RijnDael source code for this), but it should be slightly faster initialising with constant arrays.

This version is not both Little and Big Endian compatible - just implements the standard Little Endian.

I expect numerous bugs within the code, so if you find any, please drop me an email at bigcalm@hotmail.com

I’ve managed to produce the same test vectors and known answer tests that the original C source produced, so I hope that most of the bugs should have been removed already.

If you want to use the RijnDael VB code in your own projects, please credit all of the following people:

Paulo Barreto, Joan Daemen, Vincent Rijmen, Antoon Bosselaers, Jonathan Daniel (that’s me by the way), and Peter Raddatz. Source code was translated after permission from Paulo Barreto - many thanks again.

Resources

http://www.nist.gov/public_affairs/releases/g00-176.htm - Information relating to NIST and Rijndael

http://www.esat.kuleuven.ac.be/~rijmen/rijndael/ - More on Rijndael (written by one of the authors)

http://www.esat.kuleuven.ac.be/~rijmen/rijndael/rijndael-fst-3.0.zip - optimised C implementation of Rijndael, by Paulo Barreto and others. (From where I took most of the information).

http://www.esat.kuleuven.ac.be/~rijmen/rijndael/rijndaeldocV2.zip - original specification for Rijndael.

http://www.rijndael.com/ - the fan (?!?) club page.

http://www.efa.org.au/Issues/Crypto/Welcome.html - General cryptography introduction.

Ulli on http://www.planetsourcecode.com/vb/ has various tutorials and example source code on other aspects of cryptography, such as the current DES system and the original Enigma machine.

Peter Raddatz’s Binworks DLL can be downloaded here with it’s original C-source here

http://www.faqs.org/faqs/cryptography-faq/ contains an excellent introduction to cryptography in general.


28th September 2001 Copyright Jonathan Daniel 2001