When you’re accessing services over the WEB – let’s pick GMail as an example – a couple of things have to happen upfront:
- The server you’re connecting to (GMail in our example) has to get to know who you are.
- Only after getting to know who you are it’s able to decide what resources you are allowed to access (e.g. your own email inbox, your Calendar, Drive etc.).
Step 1 above is called authentication. Step 2 is authorization (server can authorize only after successful authentication).
Using apps like Google Authenticator is all about step 1. And you can think of that step as logging in to your GMail account.
What problem are we solving?
Authentication is typically performed using one (or more) of the following approaches:
- What you know? – The server “tests you” by asking about something only you are supposed to know (e.g. username and password). That’s the most common approach. You’re presented with a login form where you enter your credentials.
- What you have? – The server tests you by making sure you have something that you’re supposed to have (e.g. a secret of sorts embedded in the bowels of Google Authenticator installed on your smartphone). Only you are supposed to have physical access to your phone. If you don’t (in other words you can’t open the app and type the code), authentication fails.
- Who you are – The server tests your biometrics. This could be done using a fingerprint reader in your smartphone/laptop, face ID in your iPhone, etc.
Multi-Factor Authentication (MFA) is about using 2 or more of the approaches described above. Typically, you’d set up username/password together with e.g. Google Authenticator (or other app). Of course for What you have approach you have more options. E.g. FIDO2, which I find much more convenient. But here we’re focusing on apps like Google Authenticator that use something called TOTP algorithm.
What we want to achieve is understand how apps like Google Authenticator (or any other TOTP app) are solving that?
How to make your smartphone necessary?
We have to come up with an algorithm that will prove that the user has physical access to their smartphone running the app at the time of being authenticated. How can we do that?
The 1st thing that comes to mind is, let’s embed a sort of secret – e.g. a password – in the app on the phone. Then, every time the user logs it, let’s ask the user to open the app and type the password that the app is showing.
For example our app has a password: $3cr3tP4$$
in it. When we are logging in, the server
(GMail) is asking us to provide username and password. We enter it. Then, as a 2nd step,
server is asking us to type the password that the app is displaying and type it into on the
web page. If we entered correct username and password in the 1st step, and later we entered
$3cr3tP4$$
– because this is what the app showed us – we’re successfully authenticated and,
therefore, logged in.
Now, you’re probably immediately seeing a gaping hole in this line of thinking. If we’re just asked to type the same password, letter for letter, from the app than there is nothing stopping us from taking that password (writing it down on a piece of paper, memorizing it etc.) out of the app. From that point on, your phone is no longer necessary here. This 2nd factor becomes a mere extension to the 1st factor (additional static password to your existing one).
One way to exploit it would be the following scenario:
- Someone (a hacker) installs a key-logger on a machine you’d use for logging in.
- They capture your username/password from the 1st factor.
- The capture the static password (
$3cr3tP4$$
) from the 2nd factor.
Now they have all your credentials they’d need to login to your GMail account.
Well, so the question becomes: How do we prove the phone contains that static secret
(e.g. that password: $3cr3tP4$$
) without ever disclosing it?
Hashing
We could use one of the hashing functions like MD5
(don’t use MD5
btw. – it’s too
vulnerable to collisions finding attacks), SHA-1
,
SHA-512
etc. Those functions transform input (any input of any size) into a standard,
specific for each of those functions, string of bytes. Since Google Authenticator, and
apps compatible with it, went with SHA-1
, let’s go with it, too.
If you transform our secret ($3cr3tP4$$
) using SHA-1
, you’ll get the following sequence
of bytes (in HEX notation):
e2 56 28 6a fb 11 fd b7 40 1f 9a f5 30 73 39 75 dc 7d c4 64
.
Just run the following command:
$ echo -n '$3cr3tP4$$' | sha1sum
e256286afb11fdb7401f9af530733975dc7dc464 -
Now – again – hash (any hash, not only SHA-1
) will always be the same for $3cr3tP4$$
.
If we’re expected to always input just that hash, or part of it since it’s quite long, we’d
always have
to enter the exact same thing. That’s not taking us any further than expecting us to type
the password every time. We’d replace typing in the same password every time with typing in
the same hash every time. We’d still be able to easily take out what’s expected of us
(the hash) and write it down on a piece of paper.
Using hash function sounds like a good start, but we’d have to somehow make it change every time, but change in a way that’s predictable only to those who know the secret. And the only ones, who do know the secret here are: GMail server and an app (e.g. Google Authenticator) on your phone.
So how do we make it change every time in a way predictable only to those who actually know the secret?
Sprinkle a bit of salt
If we can add something to the password every time and calculate hash off of that, the hash would be different every time. In addition to that if both, server and the app, can sync on what is being added to the password, we’d be able to achieve having hash value being different every time and in a predictable way. For example, we could go with the following algorithm (application side 1st):
- Initiate a counter at 0 (this step is performed only once initially).
- Append an ASCII representation of the counter (at the 1st pass the result is:
0$3cr3tP4$$
). - Calculate the hash off of the above (initially
0$3cr3tP4$$
, result is – in ASCII –41bb78b5b4963c5c78c79702b47d414d17d9127e
). - Increase the counter by 1.
- Present the last couple of bytes – let’s say 3 (ASCII representation ) – to the user (
d9127e
). - User enters
d9127e
in the web form.
On the server side:
- Initiate a counter at 0 (this step is performed only once initially).
- Append an ASCII representation of the counter (at the 1st pass the result is:
0$3cr3tP4$$
). - Calculate the hash off of the above (initially
0$3cr3tP4$$
, result is – in ASCII –41bb78b5b4963c5c78c79702b47d414d17d9127e
). - Increase the counter by 1.
- Get the last 3 bytes (ASCII representation ) of the above SHA-1 sum (
d9127e
). - Compare those 3 bytes against what user entered into the WEB form. If the two match, authentication is successful. If not, authentication failed.
2nd time around the counter would’ve been 1, so both sides would be calculating hash off of 1$3cr3tP4$$
.
The result would be efbffb4295680f674d5c061cb60103bbc7d6fca2
(last 3 bytes d6fca2
). So now the code
is changing every time in a predictable way. But it’s predictable only if you know the secret ($3cr3tP4$$
).
If you don’t you wouldn’t know what to calculate the hash off of. Even if you’d know the value of the counter
you wouldn’t be able to replicate the value of the hash.
The method of adding little extra something to whatever’s being hashed is called salting, and
it’s pretty popular technique For example, if you’re running GNU/Linux it is being used also in
your /etc/shadow. Check out how the hashed value of your password looks like. Then use passwd
command to modify the password, but don’t really change it – enter the same password
you’ve been using. Then check the hashed representation of your (not so) new password in /etc/shadow
again. Even though your password hasn’t really changed, the hash will be different. That’s because
your system is salting passwords before counting hash and storing it in /etc/shadow.
The crux of apps like Google Authenticator is using exactly this idea. Take 2 components: secret (known only to server and app) and a counter in some form (not really confidential), combine them in some way, and calculate a hash. Next we’ll look at all the technical details around that process. Or should I say processes – there are actually 2 standards we’ll describe: HOTP and TOTP.
Meet HMAC
The idea of combining the secret with a counter to produce each time unique – yet predictable only
to server and app – hash is tricky to get correctly in a way that is resistant to various attack vectors
that would somehow make it easier to predict the next hash. The standard that addresses security around
that concept is called HMAC (Hash-based Message
Authentication Code). This algorithm describes a process of combining a key (in our example
$3cr3tP4$$
) with a message (in our case the counter) and calculating HASH in a safe and hard to
predict way if you don’t know the key.
Another way to think of HMAC is what the name actually stands for – signing a message (the counter
in our case) cryptographically using a pre-shared secret (the super-secret password we’ve been
using here: $3cr3tP4$$
). So your app is signing the counter with that password. And only the
server is able to verify the signature, because the only other entity that knows the password
is the server.
Google Authenticator and other authenticator apps compatible with it actually use HMAC-SHA-1 – an HMAC that’s using SHA-1 for calculating hashes. So let’s stick to SHA-1 throughout the rest of this article.
To learn more and better understand HMAC, please, take a look at the Definition of the Wikipedia article I linked above. Especially at the diagram:
- K – key
- K’ – key processed for the needs of the HMAC algorithm
- m – message (the counter in our case)
- H(…) – this is a hash function (e.g. SHA-1)
- + (the + in a circle) – an XOR operation
- opad – Outer padded key (see the details below)
- ipad – Inner padded key (see the details below)
- || – Concatenation
In a nutshell first we’re preprocessing the key. HMAC requires the key to be of a specific length. Its
size must be equal to the hashing algorithm’s block size or its result size. For example for SHA-1 that’s
64 bytes or 20 bytes. Well, most of the time the key you’d choose would be either shorter or longer. So
preprocessing is about making the key exactly 64, or 20 bytes long. If your password is shorter
($3cr3tP4$$
very much is), you just add zeros at the end to make it 64 bytes long. If the key is
exactly 64 bytes long, you don’t do anything – it’s already good. If it’s longer, you calculate
SHA-1 and use that as the key (here it’s size is going to be only 20 bytes).
Next you do some XORing when you calculate opad and ipad and in the process you’re XORing the
processed key with 0x5c
/0x36
. This is the critical part. if
you don’t know the secret, you don’t know what to XOR against the above bytes, and because of that
you wouldn’t be able to achieve the final result – the final hash.
Then you hash the result of XORing key with ipad concatenated with the message (the counter). Next you concatenate the result of opad XORing with the above and hash that. The end – this is the final result of HMAC.
Off-topic remark regarding long keys
Notice that when the key is, say, 40 bytes long, it’s padded with zeros to make it 64 bytes long. So the actual entropy of that key is equivalent to the 40-byte long key. But if the key is longer than 64 bytes, it’s effectively being shortened to 20 bytes. So its entropy is lower. If you want to maximize the complexity of the key, and you’re the one choosing the key, go for something around, but never more than, 64 bytes.
Quick code example
First off, never implement cryptographic operations if you’re not an expert in this area. Crypto is very critical in terms of security, and it’s really easy to get wrong in unexpected ways! Use a renown crypto library to do the job for you.
If you’re a coder like me, you’d probably prefer to look at the code to help you wrap your head around it. I concocted a super-short Python script to illustrate the elegance and simplicity of HMAC (HMAC-SHA-1 flavour in particular). I haven’t implemented SHA-1 – had to draw the line somewhere. I decided to draw it here not to make this article too long. But trust me, it’s not a rocket science, and you’d be able to wrap your head around SHA-1 and implement it, too. Don’t do it for anything other than self-education, though. Use a crypto library, instead.
In the code below you can see, I’m signing initial counter value (still an ASCII representation
of 0) with our password ($3cr3tP4$$
). You can google-out an online HMAC calculator to verify
it actually does the job.
#! /usr/bin/env python3
import hashlib
# Our secret known only to the server and the app
secret = "$3cr3tP4$$"
# Counter. Here it's simply an ASCII string representation of the counter
# value. Initially it's "0"
initial_counter = "0"
# In HMAC-SHA-1 we're using SHA-1 hash function. In SHA-1 specifically
# the block size is 64 bytes. And the result is 20 bytes long.
sha_1_block_size_bytes = 64
sha_1_result_size_bytes = 20
def prepare_key(key):
if len(key) > sha_1_block_size_bytes:
# When the key is longer than 64 bytes, just hash it (in
# HMAC-SHA-1 using SHA-1). The hash is 20 bytes long, so
# later, after if else, pad it with zeroes to a total of
# 64 bytes.
result = hashlib.sha1(key).digest()
else:
# When the key is either exactly 64 bytes long, or shorter,
# no processing needed. Just make sure that after if else
# it's being padded with zeroes to a total of 64 bytes.
result = key
while len(result) < sha_1_block_size_bytes:
result += b"\0"
return result
def pad(processed_key, padding_byte):
# Processed key is already exactly 64 bytes long.
result = list()
for ch in processed_key:
# Every byte of the key is being XORed with padding_byte (either
# 0x5c for the outer pad or 0x36 for the inner pad).
b = ch ^ padding_byte
result.append(b)
return bytes(result)
def hmac_sha_1(key, counter):
key = prepare_key(bytearray(key, "utf-8"))
# outer_key_pad is basically an opad from HMAC's definition on
# the Wikipedia:
# https://en.wikipedia.org/wiki/HMAC#Definition
outer_key_pad = pad(key, 0x5c)
# Similarly, inner_key_pad is ipad (same link as above)
inner_key_pad = pad(key, 0x36)
inner = inner_key_pad + bytearray(counter, "utf-8")
inner_hashed = hashlib.sha1(inner).digest()
return hashlib.sha1(outer_key_pad + inner_hashed)
print(f'Secret: "{secret}"')
print(f'Initial counter: "{initial_counter}"')
# hexdigest() returns an ASCII representation of the hash in HEX
# notation (each byte is a hexadecimal number).
print(f'HMAC-SHA-1: "{hmac_sha_1(secret, initial_counter).hexdigest()}"')
Output of this script:
Secret: "$3cr3tP4$$"
Initial counter: "0"
HMAC-SHA-1: "5d1014482edb0afb42101d8d4b5ff9bb5340a683"
EDIT: Originally I made a mistake in the prepare_key()
function above for a case where
the key was longer than 64 bytes. In that case I was computing SHA1
hash – as I should –
but I was not padding with zeroes to a total of 64 bytes. That resulted in a processed
key being 20, instead of 64 bytes long. Thanks a lot
Warren James for spotting the mistake and letting me
know!
Now, there you have it. We’re pretty close to understanding the details of how authenticator apps work. We need to figure out a couple of details, like how to represent the counter in an HMAC. They don’t use ASCII representation. Another problem is keeping counter in sync between server and the app. We’ll tackle those in the 2nd part of this write-up. It’ll focus on details specifically around the 2 algorithms that are used by those authenticator apps: HOTP, and TOTP.