Cryptopals set 1
These are my solutions in python3 for the first set of the cryptopals challenge, for each exercise I'll show the code and give an explation. At the end of each challenge I'll add the solution of the example given in the challenge.
I'll paste only the important portion of code for each exercise, you can find the complete solutions on this repo with some tests.
List of challenges:
01. Convert hex to base64
This one is quite straightforward using python. The function hex2b64
will
take a bytes object and encode it using base64.
import base64
def hex2b64(hex_val: bytes) -> bytes:
"""Encode a bytes object using base64 encoding.
:param hex_val: this is the bytes object to be converted
:type hex_val: bytes
:rtype: bytes
"""
return base64.b64encode(hex_val)
The result using the hex value provided in the example is: SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t.
PS: the hex string is "I'm killing your brain like a poisonous mushroom"
from
"Vanilla Ice - Ice Ice Baby".
02. Fixed XOR
The second exercise asks to write a function that takes two equal-lenght buffers and returns their XOR combination. Again we use bytes-like object as arguments of our functions.
def fixed_xor(buf1: bytes, buf2: bytes) -> bytes:
"""Takes two equal length buffers and produces their XOR combination.
:param buf1: first buffer
:type buf1: bytes
:param buf2: second buffer
:type buf2: bytes
:rtype: bytes
"""
if len(buf1) != len(buf2):
raise ValueError
xored = bytearray()
for x,y in zip(buf1, buf2):
xored.append(x^y)
return bytes(xored)
The result obtained using the strings in the example is: 746865206b696420646f6e277420706c6179.
PS: str2
and the resulting XORed string from the example are taken from the
lyrics of "vanilla ice - too cold": "hit the bull's eye", "the kid don't play".
03. Single-byte XOR cipher
We are now given a string encrypted using a single byte key and we have to find the key in order to decrypt the message. We can adopt a few different approach for this problem, but the key point is that, if we use the same key to encrypt each byte, equal bytes in the plaintext will produce equal bytes in the ciphertext. When encrypting a string in such manner we are actually substituting each plaintext character with another one: the problem with substitiution ciphers is that they are susceptible of frequency analysis, meaning that it's possible to deduce the plaintext by the frequency distribution of the ciphertext. Assuming that the plaintext is english text, we can either:
-
Look at the frequency distribution of the ciphertext and compare it to to the english letter frequency in order to guess the correspondence between ciphertext and plaintext characters
-
Since the possible keys are a small number (assuming it's an ascii character there are 128 possibility) we can do an exhaustive search, trying every possible key and scoring the obtained plaintexts to spot the most english-like. Even in this case we use the character frequency as a way to score an english plaintext.
Let's go for the second method, which is the suggested one:
def single_xor_guess(xord_buf: bytearray) -> list:
"""Guess the plaintext P from the ciphertext C (xord_buf), obtained by
XORing every character of P with the single character K.
Both the key K and the plaintext P are encoded using ASCII and the
plaintext is in english.
:param xord_buf: the XORed ciphertext
:type xord_buf: bytearray
:return: an ordered list of tuples, each one containing the guessed plaintext as
bytes object, the likelihood of being english text as a float number and
key used to decrypt the ciphertext.
:rtype: list
"""
# Load letter frequency from file.
freq_format = re.compile('(.)\s+(\S+)')
freq_table = {}
with open('eng_freq.txt', 'r') as f:
for l in f.read().splitlines():
score = freq_format.search(l)
freq_table[score.group(1)] = float(score.group(2))
def score_plaintext(plain: str, freq_table: dict) -> float:
"""Score the likelihood of a string being a plaintext of the language
described by the frequency table.
:param plain: the plaintext string
:type plain: str
:param freq_table: the character frequency table for a language
:type freq_table: dict
:rtype: float
"""
score = 0
for c in plain:
if c in freq_table.keys():
score += freq_table[c]
return score
# Get a list of guessed plaintext, in descending order from most to less
# probable
guessed_plaintext = []
for i in range(128):
key_buf = bytes(chr(i)*len(xord_buf), 'utf-8')
plaintext = fixed_xor(xord_buf, key_buf)
try:
score = score_plaintext(plaintext.decode('utf-8'), freq_table)
except:
score = 0
guessed_plaintext.append((plaintext, score, chr(i)))
# Sort by score in descending order
guessed_plaintext.sort(key=lambda x: x[1], reverse=True)
return guessed_plaintext
For the example proposed in the exercise we find that the key was: 'X'
and the
plaintext: Cooking MC's like a pound of bacon.
04. Detect single-character XOR
We now have a list of strings and we have to find which one has been encrypted using a single-character XOR. We can use the same strategy used in the challenge 03, so we can reuse the same code to score the possible plaintexts obtained from all the ciphertexts.
with open('4.txt', 'r') as f:
guessed_plaintext = []
for encryptd in f.read().splitlines():
for i in single_xor_guess(bytearray.fromhex(encryptd)):
guessed_plaintext.append(i+(encryptd,))
# Sort by score in descending order
guessed_plaintext.sort(key=lambda x: x[1], reverse=True)
# Plaintext
print(guessed_plaintext[0][0].decode('utf-8'))
# Ciphertext
print(guessed_plaintext[0][2])
# Key
print(guessed_plaintext[0][3])
We find that the ciphertext is: 7b5a4215415d544115415d5015455447414c155c46155f4058455c5b523f, encrypted with the key 5 and the corresponding plaintext is: Now that the party is jumping.
05. Implement repeating-key XOR
For this challenge we have to implement a repeating-key XOR. So, given the key \(K=[k_1, k_2, .., k_n]\) with length \(n\) and a plaintext \(P=[p_1, p_2, .., p_m]\) with length \(m\) we have to XOR the plaintext repeating the key K in such a way that it matches the length of the plaintext.
def rep_xor(plaintext: bytearray, key: bytearray) -> bytearray:
"""Encrypt plaintext using repeating-key XOR.
:param plaintext:
:type plaintext: bytearray
:param key:
:type key: bytearray
:rtype: bytearray
"""
ciphertext = bytearray(plaintext)
for i in range(len(ciphertext)):
ciphertext[i] = ciphertext[i]^key[i%len(key)]
return ciphertext
06. Break repeating-key XOR
This challenge requires us to recover the plaintext from a ciphertext obtained using a repeating-key XOR encryption. The method to crack this kind of encryption is explained in the exercise, let's briefly go through it here:
Given the plaintext \(P=[p_0, p_1, .., p_n]\) the key \(K=[k_0, k_1, .., k_m]\) and assuming that the plaintext is much longer than the key \(m \lt n\), the resulting ciphertext \(C=[c_0, c_1, .., c_n]\) is computed as:
Just to picture it:
p0-p1-p2-p3-p4-p5-p6-p7-p8-p9
k0-k1-k2-k0-k1-k2-k0-k1-k2-k0
-----------------------------
c0-c1-c2-c3-c4-c5-c6-c7-c8-c9
If we find the keylength \(m\) we can treat this problem to the single-byte XOR exercise, since every \(m\) characters we encrypt the plaintext with the same value. To find the keylenght it is suggested to try different keysize and pick the ones that will lead to a lower hamming distance between the blocks in the ciphertext.
def hamming_distance(buf01: bytearray, buf02: bytearray) -> int:
"""hamming_distance: compute the hamming distance between two strings of
bytes returning the number of different bits.
:param buf01:
:type buf01: bytearray
:param buf02:
:type buf02: bytearray
:rtype: int
"""
# Check if the buffers are the same length
if len(buf01) != len(buf02):
raise ValueError
# XOR the two buffers, the result will have the bits set in the position
# where the two buffers are different.
xored = bytearray([buf01[i]^buf02[i] for i in range(len(buf01))])
xored = int.from_bytes(xored, 'big', signed=False)
# Count the set bits.
count = 0
while xored:
xored &= xored-1
count += 1
return count
Once we guessed the keylenght we can proceed to work on the ciphertext to treat the problem as a single-byte XOR encryption:
# Get the most likely KEYLENGHTs
score = find_keylenght(rxor_enc)
# Try the first three more likely keysizes
for keysize in [score[i][0] for i in range(3)]:
blocks[keysize] = {}
# For each keysize get all the ciphertext value that are encrypted with
# the same value and store them in different sets
for i in range(len(rxor_enc)):
# initialise the list if it doesn't exist
blocks[keysize][i%keysize] = blocks[keysize].get(i%keysize, list())
# store the ciphertext value
blocks[keysize][i%keysize].append(rxor_enc[i])
guessed_keys = []
# Solve for each keylenght
for k,v in blocks.items():
print('[+] KeyLength {0} - guessed key: '.format(k))
key = bytearray()
# Treat each sets as a single xor problem
for e,a in v.items():
key.append(ord(single_xor_guess(bytearray(a))[0][2]))
print(key.decode())
guessed_keys.append(key)
# Print the plaintext
print("\n[+] Using the most likely key to decrypt:\n")
key = guessed_keys[0]*(len(rxor_enc))
plaintext = fixed_xor(rxor_enc, key[:len(rxor_enc)])
print(plaintext.decode())
We find that the key used was: Terminator X: Bring the noise
and the
plaintext are the lyrics of Vanilla Ice - Play That Funky Music.
07. AES in ECB mode
For this challenge we are asked to decrypt a file that has been encrypted using the block-cipher AES-128 in ECB (Electronic CodeBook). For this task I chose to use the pyca/cryptography module but there are other options available. The code is pretty straightforward:
from cryptography.hazmat.primitives.ciphers import Cipher
from cryptography.hazmat.primitives.ciphers.algorithms import AES
from cryptography.hazmat.primitives.ciphers.modes import ECB
from cryptography.hazmat.backends import default_backend
def decode_AES_128_ECB(ciphertext: bytes, key: bytes) -> bytes:
"""decode_AES_128_ECB
:param ciphertext:
:type ciphertext: bytes
:param key:
:type key: bytes
:rtype: bytes
"""
backend = default_backend()
cipher = Cipher(AES(key), ECB(), backend=backend)
decryptor = cipher.decryptor()
plaintext = decryptor.update(ciphertext) + decryptor.finalize()
return plaintext
The plaintext are the lyrics of "Vanilla Ice - Play that funky music", you can verify your result with OpenSSL with this command:
$ openssl aes-128-ecb -K $(echo -n "YELLOW SUBMARINE" | xxd -p | head -c -1) -nosalt -d -in 7.txt -a -nopad
08. Detect AES in ECB mode
For the last challenge of the first set we are given a file containing quite a few hex-encoded ciphertexts and we have to find out which one has been encrypted using ECB. The problem with ECB is that, as it happened in the repeating XOR, if same plaintexts align with the key, the resulting ciphertext will be the same. This property allows us to guess wich one of the ciphertexts is encrypted using ECB by looking for the ones containing repeating blocks.
def find_repeating_blocks(buff: bytes ,bsize: int=16) -> list:
blen = len(buff)
bnum = blen//bsize
blocks = {}
for i in range(0, bnum*bsize, bsize):
blocks[buff[i:i+bsize]] = blocks.get(buff[i:i+bsize], 0) + 1
return blocks
We discover that the only ciphertext containing repeating block is:
d880619740a8a19b7840a8[...]933f2c123c58386b06fba186a
found at line 133.