When you’re accessing services over the WEB – let’s pick GMail as an example – couple of things have to happen upfront:

  1. The server you’re connecting to (GMail in our example) has to get to know who you are.
  2. Only after getting to know who you are it’s able to decide what resources you are allowed to access (e.g. your own email inbox, your Calendar, Drive etc.).

Step 1 above is called authentication. Step 2 is authorization (server can authorize only after successful authentication).

Using apps like Google Authenticator is all about step 1. And you can think of that step as logging in to your GMail account.

What problem are we solving?

Authentication is typically performed using one (or more) of the following approaches:

  1. What you know? – The server “tests you” by asking about something only you are supposed to know (e.g. username and password). That’s the most common approach. You’re presented with a login form where you enter your credentials.
  2. What you have? – The server tests you by making sure you have something that you’re supposed to have (e.g. a secret of sorts embedded in the bowels of Google Authenticator installed on your smartphone). Only you are supposed to have physical access to your phone. If you don’t (in other words you can’t open the app and type the code), authentication fails.
  3. Who you are – The server tests your biometrics. This could be done using a fingerprint reader in your smartphone/laptop, face ID in your iPhone etc.

Multi-Factor Authentication (MFA) is about using 2 or more of the approaches described above. Typically, you’d set up username/password together with e.g. Google Authenticator (or other app). Of course for What you have approach you have more options. E.g. FIDO2, which I find much more convenient. But here we’re focusing on apps like Google Authenticator that use something called TOTP algorithm.

What we want to achieve is understand how apps like Google Authenticator (or any other TOTP app) are solving that?

How to make your smartphone necessary?

We have to come up with an algorithm that will prove that the user has physical access to their smartphone running the app at the time of being authenticated. How can we do that?

The 1st thing that comes to mind is, let’s embed a sort of secret – e.g. a password – in the app on the phone. Then, every time the user logs it, let’s ask the user to open the app and type the password that the app is showing.

For example our app has a password: $3cr3tP4$$ in it. When we are logging in, the server (GMail) is asking us to provide username and password. We enter it. Then, as a 2nd step, server is asking us to type the password that the app is displaying and type it into on the web page. If we entered correct username and password in the 1st step, and later we entered $3cr3tP4$$ – because this is what the app showed us – we’re successfully authenticated and, therefore, logged in.

Now, you’re probably immediately seeing a gaping hole in this line of thinking. If we’re just asked to type the same password, letter for letter, from the app than there is nothing stopping us from taking that password (writing it down on a piece of paper, memorizing it etc.) out of the app. From that point on, your phone is no longer necessary here. This 2nd factor becomes a mere extension to the 1st factor (additional static password to your existing one).

One way to exploit it would be the following scenario:

  1. Someone (a hacker) installs a key-logger on a machine you’d use for logging in.
  2. They capture your username/password from the 1st factor.
  3. The capture the static password ($3cr3tP4$$) from the 2nd factor.

Now they have all your credentials they’d need to login to your GMail account.

Well, so the question becomes: How do we prove the phone contains that static secret (e.g. that password: $3cr3tP4$$) without ever disclosing it?

Hashing

We could use one of the hashing functions like MD5 (don’t use MD5 btw. – it’s too vulnerable to collisions finding attacks), SHA-1, SHA-512 etc. Those functions transform input (any input of any size) into a standard, specific for each of those functions, string of bytes. Since Google Authenticator, and apps compatible with it, went with SHA-1, let’s go with it, too.

If you transform our secret ($3cr3tP4$$) using SHA-1, you’ll get the following sequence of bytes (in HEX notation):

e2 56 28 6a fb 11 fd b7 40 1f 9a f5 30 73 39 75 dc 7d c4 64.

Just run the following command:

$ echo -n '$3cr3tP4$$' | sha1sum 
e256286afb11fdb7401f9af530733975dc7dc464  -

Now – again – hash (any hash, not only SHA-1) will always be the same for $3cr3tP4$$. If we’re expected to always input just that hash, or part of it since it’s quite long, we’d always have to enter the exact same thing. That’s not taking us any further than expecting us to type the password every time. We’d replace typing in the same password every time with typing in the same hash every time. We’d still be able to easily take out what’s expected of us (the hash) and write it down on a piece of paper.

Using hash function sounds like a good start, but we’d have to somehow make it change every time, but change in a way that’s predictable only to those who know the secret. And the only ones, who do know the secret here are: GMail server and an app (e.g. Google Authenticator) on your phone.

So how do we make it change every time in a way predictable only to those who actually know the secret?

Sprinkle a bit of salt

If we can add something to the password every time and calculate hash off of that, the hash would be different every time. In addition to that if both, server and the app, can sync on what is being added to the password, we’d be able to achieve having hash value being different every time and in a predictable way. For example, we could go with the following algorithm (application side 1st):

  1. Initiate a counter at 0 (this step is performed only once initially).
  2. Append an ASCII representation of the counter (at the 1st pass the result is: 0$3cr3tP4$$).
  3. Calculate the hash off of the above (initially 0$3cr3tP4$$, result is – in ASCII – 41bb78b5b4963c5c78c79702b47d414d17d9127e).
  4. Increase the counter by 1.
  5. Present the last couple of bytes – let’s say 3 (ASCII representation ) – to the user (d9127e).
  6. User enters d9127e in the web form.

On the server side:

  1. Initiate a counter at 0 (this step is performed only once initially).
  2. Append an ASCII representation of the counter (at the 1st pass the result is: 0$3cr3tP4$$).
  3. Calculate the hash off of the above (initially 0$3cr3tP4$$, result is – in ASCII – 41bb78b5b4963c5c78c79702b47d414d17d9127e).
  4. Increase the counter by 1.
  5. Get the last 3 bytes (ASCII representation ) of the above SHA-1 sum (d9127e).
  6. Compare those 3 bytes against what user entered into the WEB form. If the two match, authentication is successful. If not, authentication failed.

2nd time around the counter would’ve been 1, so both sides would be calculating hash off of 1$3cr3tP4$$. The result would be efbffb4295680f674d5c061cb60103bbc7d6fca2 (last 3 bytes d6fca2). So now the code is changing every time in a predictable way. But it’s predictable only if you know the secret ($3cr3tP4$$). If you don’t you wouldn’t know what to calculate the hash off of. Even if you’d know the value of the counter you wouldn’t be able to replicate the value of the hash.

The method of adding little extra something to whatever’s being hashed is called salting, and it’s pretty popular technique For example, if you’re running GNU/Linux it is being used also in your /etc/shadow. Check out how the hashed value of your password looks like. Then use passwd command to modify the password, but don’t really change it – enter the same password you’ve been using. Then check the hashed representation of your (not so) new password in /etc/shadow again. Even though your password hasn’t really changed, the hash will be different. That’s because your system is salting passwords before counting hash and storing it in /etc/shadow.

The crux of apps like Google Authenticator is using exactly this idea. Take 2 components: secret (known only to server and app) and a counter in some form (not really confidential), combine them in some way, and calculate a hash. Next we’ll look at all the technical details around that process. Or should I say processes – there are actually 2 standards we’ll describe: HOTP and TOTP.

Meet HMAC

The idea of combining the secret with a counter to produce each time unique – yet predictable only to server and app – hash is tricky to get correctly in a way that is resistant to various attack vectors that would somehow make it easier to predict the next hash. The standard that addresses security around that concept is called HMAC (Hash-based Message Authentication Code). This algorithm describes a process of combining a key (in our example $3cr3tP4$$) with a message (in our case the counter) and calculating HASH in a safe and hard to predict way if you don’t know the key.

Another way to think of HMAC is what the name actually stands for – signing a message (the counter in our case) cryptographically using a pre-shared secret (the super-secret password we’ve been using here: $3cr3tP4$$). So your app is signing the counter with that password. And only the server is able to verify the signature, because the only other entity that knows the password is the server.

Google Authenticator and other authenticator apps compatible with it actually use HMAC-SHA-1 – an HMAC that’s using SHA-1 for calculating hashes. So let’s stick to SHA-1 throughout the rest of this article.

To learn more and better understand HMAC, please, take a look at the Definition of the Wikipedia article I linked above. Especially at the diagram:

Wikipedia HMAC diagram

  • K – key
  • K' – key processed for the needs of the HMAC algorithm
  • m – message (the counter in our case)
  • H(…) – this is a hash function (e.g. SHA-1)
  • + (the + in a circle) – an XOR operation
  • opad – Outer padded key (see the details below)
  • ipad – Inner padded key (see the details below)
  • || – Concatenation

In a nutshell first we’re preprocessing the key. HMAC requires the key to be of a specific length. Its size must be equal to the hashing algorithm’s block size or its result size. For example for SHA-1 that’s 64 bytes or 20 bytes. Well, most of the time the key you’d choose would be either shorter or longer. So preprocessing is about making the key exactly 64, or 20 bytes long. If your password is shorter ($3cr3tP4$$ very much is), you just add zeros at the end to make it 64 bytes long. If the key is exactly 64 bytes long, you don’t do anything – it’s already good. If it’s longer, you calculate SHA-1 and use that as the key (here it’s size is going to be only 20 bytes).

Next you do some XORing when you calculate opad and ipad and in the process you’re XORing the processed key with 0x5c/0x36. This is the critical part. if you don’t know the secret, you don’t know what to XOR against the above bytes, and because of that you wouldn’t be able to achieve the final result – the final hash.

Then you hash the result of XORing key with ipad concatenated with the message (the counter). Next you concatenate the result of opad XORing with the above and hash that. The end – this is the final result of HMAC.

Off-topic remark regarding long keys

Notice that when the key is, say, 40 bytes long, it’s padded with zeros to make it 64 bytes long. So the actual entropy of that key is equivalent to the 40-byte long key. But if the key is longer than 64 bytes, it’s effectively being shortened to 20 bytes. So its entropy is lower. If you want to maximize the complexity of the key, and you’re the one choosing the key, go for something around, but never more than, 64 bytes.

Quick code example

First off, never implement cryptographic operations if you’re not an expert in this area. Crypto is very critical in terms of security, and it’s really easy to get wrong in unexpected ways! Use a renown crypto library to do the job for you.

If you’re a coder like me, you’d probably prefer to look at the code to help you wrap your head around it. I concocted a super-short Python script to illustrate the elegance and simplicity of HMAC (HMAC-SHA-1 flavour in particular). I haven’t implemented SHA-1 – had to draw the line somewhere. I decided to draw it here not to make this article too long. But trust me, it’s not a rocket science, and you’d be able to wrap your head around SHA-1 and implement it, too. Don’t do it for anything other than self-education, though. Use a crypto library, instead.

In the code below you can see, I’m signing initial counter value (still an ASCII representation of 0) with our password ($3cr3tP4$$). You can google-out an online HMAC calculator to verify it actually does the job.

#! /usr/bin/env python3

import hashlib


# Our secret known only to the server and the app
secret = "$3cr3tP4$$"

# Counter. Here it's simply an ASCII string representation of the counter
# value. Initially it's "0"
initial_counter = "0"

# In HMAC-SHA-1 we're using SHA-1 hash function. In SHA-1 specifically
# the block size is 64 bytes. And the result is 20 bytes long.
sha_1_block_size_bytes = 64
sha_1_result_size_bytes = 20


def prepare_key(key):
    if len(key) > sha_1_block_size_bytes:
        # When the key is longer than 64 bytes, just hash it (in
        # HMAC-SHA-1 using SHA-1). The hash is 20 bytes long, so
        # later, after if else, pad it with zeroes to a total of
        # 64 bytes.
        result = hashlib.sha1(key).digest()
    else:
        # When the key is either exactly 64 bytes long, or shorter,
        # no processing needed. Just make sure that after if else
        # it's being padded with zeroes to a total of 64 bytes.
        result = key
    while len(result) < sha_1_block_size_bytes:
        result += b"\0"
    return result


def pad(processed_key, padding_byte):
    # Processed key is already exactly 64 bytes long.
    result = list()
    for ch in processed_key:
        # Every byte of the key is being XORed with padding_byte (either
        # 0x5c for the outer pad or 0x36 for the inner pad).
        b = ch ^ padding_byte
        result.append(b)

    return bytes(result)


def hmac_sha_1(key, counter):
    key = prepare_key(bytearray(key, "utf-8"))
    # outer_key_pad is basically an opad from HMAC's definition on
    # the Wikipedia:
    # https://en.wikipedia.org/wiki/HMAC#Definition
    outer_key_pad = pad(key, 0x5c)
    # Similarly, inner_key_pad is ipad (same link as above)
    inner_key_pad = pad(key, 0x36)

    inner = inner_key_pad + bytearray(counter, "utf-8")
    inner_hashed = hashlib.sha1(inner).digest()

    return hashlib.sha1(outer_key_pad + inner_hashed)


print(f'Secret: "{secret}"')
print(f'Initial counter: "{initial_counter}"')
# hexdigest() returns an ASCII representation of the hash in HEX
# notation (each byte is a hexadecimal number).
print(f'HMAC-SHA-1: "{hmac_sha_1(secret, initial_counter).hexdigest()}"')

Output of this script:

Secret: "$3cr3tP4$$"
Initial counter: "0"
HMAC-SHA-1: "5d1014482edb0afb42101d8d4b5ff9bb5340a683"

EDIT: Originally I made a mistake in the prepare_key() function above for a case where the key was longer than 64 bytes. In that case I was computing SHA1 hash – as I should – but I was not padding with zeroes to a total of 64 bytes. That resulted in a processed key being 20, instead of 64 bytes long. Thanks a lot Warren James for spotting the mistake and letting me know!

Now, there you have it. We’re pretty close to understanding the details of how authenticator apps work. We need to figure out a couple of details, like how to represent the counter in an HMAC. They don’t use ASCII representation. Another problem is keeping counter in sync between server and the app. We’ll tackle those in the 2nd part of this write-up. It’ll focus on details specifically around the 2 algorithms that are used by those authenticator apps: HOTP, and TOTP.