Have you ever heard of Ceasar's Cipher? Allegedly Julius Ceasar used this very simple encryption scheme to protect his private messages.

The algorithm is very easy: It changes every letter c in a message by substituting it with the character c' 3 places to the right of c in the alphabet. At the end of the alphabet it wraps around and starts at the beginning.

For example A -> D, H -> K, .., X -> A, Y -> B, Z -> C, and the message HELLO is turned into KHOOR.

To decrypt a message, all you have to do is count backwards in the alphabet so D -> A, K -> H, ..., C -> Z.

This algorithm can easily be changed by changing the number of steps you move to the left or right in the alphabet for each character. If you use do 13 steps instead of just 3 you end up with ROT13 - you might have heard of that too - which has the very nice property that you can use the same method to encrypt and decrypt you message (in our common 26-character alphabet).

So you can parametrize this little algorithm by this step and get a bunch of algorithms - nice huh?

Demo

If you want you can try out the algorithm here:

some math

So why has ROT13 this nice property? The magic is in the way the algorithm wraps inputs. If you number the letters A to Z with 0 to 25 you might notice that the operation here is really just an addition: A=0 -> 3=D, B=1 -> 4=E, ...

But at the end there is something curious happening: X=23 -> 26=A?, Y=24 -> 27=B and Z=25 -> 28=C?

It's almost as if we want 26 and 0 to be the same?

And indeed mathematician have a way to express exactly that idea by working with quotient sets/groups.


The idea behind that is: Given an equivalence relation \(\sim\) on some set \(S\) you can define a new set \(S/\sim\) that consists of all the equivalence classes of \(\sim\).

These are subsets of \(S\) so that for two elements \(a,b \in S\) you always have \(a \sim b\)).

Here we are interested in integers and the relation is

$$ a \sim b \Leftrightarrow 26 \ divides \ (a-b) $$

It's easy to extent addition to that set which usually is written as \(\mathbb{Z}_{26}\).


But don't fear: You probably already know this - all you have to do is `mod` 26 everything:

  • 0 `mod` 26 == 0
  • 1 `mod` 26 == 1

...

  • 25 `mod` 26 == 25
  • 26 `mod` 26 == 0
  • 27 `mod` 26 == 1

And that is why there are really only 26 different algorithms (think about it). It even get's better: decrypting with key k is the same as encrypting with key -k. The nice property of ROT13 is a small corollary from the fact that

$$ 26 - k \equiv -k \ (mod \ 26) $$

which shows, that \( -13 \equiv 13 \ (mod \ 26) \).

first approach

It's not to hard to use this exact idea - Mapping a character into a number, adding the key to this number modulo 26 and then translating the resulting number back into a character - in an imperative language.

Here is a basic version in C#:

Please note that this will not change characters outside the A..Z range.


a more functional solution

Of course you could do the same in pretty much every language out there but for my taste that all is just a bit to much bound to low-level implementation details.

For example the solution above would not work if the character-values for A to Z were not direct successor of each other.

In the end ShfitChar is just a function with a very limited domain (if you don't count all the characters it keeps constant).

On the letters A - Z it's indeed a bijection and a very simple one.

So simple that there used to be toys that helped kids encrypt their messages just like Ceasar but without knowing anything about quotient groups or modulo arithmetic - all you had to do was find the character on a toy like this:

code ring

and look bellow the character to find it's encrypted form.

To decrypt a character back you could either flip the ring or - if it was a fancy one - rotate it in the other direction.

So instead of that arithmetic imagine a approach like that toy.

modelling in Haskell

First let's simplify things a bit: Instead of being able to move left and right on a ring it's enough to only move right because sooner or later (see the math section) it get's to the same state moving left would have.

As Haskell is lazy the top-section of the ring - if we start at A is just

which is the same for the bottom section if it was not twisted (key = 0):

Rotating the bottom ring left is easy too:

Now once the ring is setup correctly encoding a character works just like you would with that toy - you look up the character on the top and give back the character on the bottom.

The naive version of this would be

But caution is needed: lookup will keep looking forever if it cannot find the character (try it! and ask yourself: why?).

To fix that it's enough to only look at only as much pairs as there are characters in the alphabet:

Composing all these parts together we already have a working algorithm:

performance or don't use in production

Of course the performance is a bit shitty. The reason is that lookup is not really up to where the arithmetic solution is, it stupidly moves along the list and will take 13 operations on average with this alphabet. So it's \(O(n)\) for a alphabet of length \(n\).

You can improve this by translating the zipped pairs into a Map/Dict of some sorts.

This is the version I used for the small demo above in Elm:

This should be fast enough for a algorithm you should never use in a real scenario ;)