Cryptopals set 1 // Unstable // Just some noise

These are my solutions in python3 for the first set of the cryptopals challenge, for each exercise I'll show the code and give an explation. At the end of each challenge I'll add the solution of the example given in the challenge.

I'll paste only the important portion of code for each exercise, you can find the complete solutions on this repo with some tests.

List of challenges:

01. Convert hex to base64
02. Fixed XOR
03. Single-byte XOR cipher
04. Detect single-character XOR
05. Implement repeating-key XOR
06. Break repeating-key XOR
07. AES in ECB mode
08. Detect AES in ECB mode

01. Convert hex to base64

This one is quite straightforward using python. The function hex2b64 will take a bytes object and encode it using base64.

import base64

def hex2b64(hex_val: bytes) -> bytes:
    """Encode a bytes object using base64 encoding.

    :param hex_val: this is the bytes object to be converted
    :type hex_val: bytes

    :rtype: bytes
    """

    return base64.b64encode(hex_val)

The result using the hex value provided in the example is: SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t.

PS: the hex string is "I'm killing your brain like a poisonous mushroom" from "Vanilla Ice - Ice Ice Baby".

02. Fixed XOR

The second exercise asks to write a function that takes two equal-lenght buffers and returns their XOR combination. Again we use bytes-like object as arguments of our functions.

def fixed_xor(buf1: bytes, buf2: bytes) -> bytes:
    """Takes two equal length buffers and produces their XOR combination.

    :param buf1: first buffer
    :type buf1: bytes
    :param buf2: second buffer
    :type buf2: bytes


    :rtype: bytes
    """
    if len(buf1) != len(buf2):
        raise ValueError

    xored = bytearray()
    for x,y in zip(buf1, buf2):
        xored.append(x^y)

    return bytes(xored)

The result obtained using the strings in the example is: 746865206b696420646f6e277420706c6179.

PS: str2 and the resulting XORed string from the example are taken from the lyrics of "vanilla ice - too cold": "hit the bull's eye", "the kid don't play".

03. Single-byte XOR cipher

We are now given a string encrypted using a single byte key and we have to find the key in order to decrypt the message. We can adopt a few different approach for this problem, but the key point is that, if we use the same key to encrypt each byte, equal bytes in the plaintext will produce equal bytes in the ciphertext. When encrypting a string in such manner we are actually substituting each plaintext character with another one: the problem with substitiution ciphers is that they are susceptible of frequency analysis, meaning that it's possible to deduce the plaintext by the frequency distribution of the ciphertext. Assuming that the plaintext is english text, we can either:

Look at the frequency distribution of the ciphertext and compare it to to the english letter frequency in order to guess the correspondence between ciphertext and plaintext characters
Since the possible keys are a small number (assuming it's an ascii character there are 128 possibility) we can do an exhaustive search, trying every possible key and scoring the obtained plaintexts to spot the most english-like. Even in this case we use the character frequency as a way to score an english plaintext.

Let's go for the second method, which is the suggested one:

def single_xor_guess(xord_buf: bytearray) -> list:
    """Guess the plaintext P from the ciphertext C (xord_buf), obtained by
    XORing every character of P with the single character K.

    Both the key K and the plaintext P are encoded using ASCII and the
    plaintext is in english.

    :param xord_buf: the XORed ciphertext
    :type xord_buf: bytearray

    :return: an ordered list of tuples, each one containing the guessed plaintext as
    bytes object, the likelihood of being english text as a float number and
    key used to decrypt the ciphertext.
    :rtype: list
    """
    # Load letter frequency from file.
    freq_format = re.compile('(.)\s+(\S+)')
    freq_table = {}
    with open('eng_freq.txt', 'r') as f:
        for l in f.read().splitlines():
            score = freq_format.search(l)
            freq_table[score.group(1)] = float(score.group(2))

    def score_plaintext(plain: str, freq_table: dict) -> float:
        """Score the likelihood of a string being a plaintext of the language
        described by the frequency table.

        :param plain: the plaintext string
        :type plain: str
        :param freq_table: the character frequency table for a language
        :type freq_table: dict

        :rtype: float
        """
        score = 0
        for c in plain:
            if c in freq_table.keys():
                score += freq_table[c]
        return score

    # Get a list of guessed plaintext, in descending order from most to less
    # probable
    guessed_plaintext = []
    for i in range(128):
        key_buf = bytes(chr(i)*len(xord_buf), 'utf-8')
        plaintext = fixed_xor(xord_buf, key_buf)
        try:
            score = score_plaintext(plaintext.decode('utf-8'), freq_table)
        except:
            score = 0
        guessed_plaintext.append((plaintext, score, chr(i)))

    # Sort by score in descending order
    guessed_plaintext.sort(key=lambda x: x[1], reverse=True)

    return guessed_plaintext

For the example proposed in the exercise we find that the key was: 'X' and the plaintext: Cooking MC's like a pound of bacon.

04. Detect single-character XOR

We now have a list of strings and we have to find which one has been encrypted using a single-character XOR. We can use the same strategy used in the challenge 03, so we can reuse the same code to score the possible plaintexts obtained from all the ciphertexts.

with open('4.txt', 'r') as f:
    guessed_plaintext = []
    for encryptd in f.read().splitlines():
        for i in single_xor_guess(bytearray.fromhex(encryptd)):
            guessed_plaintext.append(i+(encryptd,))

    # Sort by score in descending order
    guessed_plaintext.sort(key=lambda x: x[1], reverse=True)

    # Plaintext
    print(guessed_plaintext[0][0].decode('utf-8'))
    # Ciphertext
    print(guessed_plaintext[0][2])
    # Key
    print(guessed_plaintext[0][3])

We find that the ciphertext is: 7b5a4215415d544115415d5015455447414c155c46155f4058455c5b523f, encrypted with the key 5 and the corresponding plaintext is: Now that the party is jumping.

05. Implement repeating-key XOR

For this challenge we have to implement a repeating-key XOR. So, given the key $K=[k_1, k_2, .., k_n]$ with length $n$ and a plaintext $P=[p_1, p_2, .., p_m]$ with length $m$ we have to XOR the plaintext repeating the key K in such a way that it matches the length of the plaintext.

def rep_xor(plaintext: bytearray, key: bytearray) -> bytearray:
    """Encrypt plaintext using repeating-key XOR.

    :param plaintext:
    :type plaintext: bytearray
    :param key:
    :type key: bytearray

    :rtype: bytearray
    """
    ciphertext = bytearray(plaintext)
    for i in range(len(ciphertext)):
        ciphertext[i] = ciphertext[i]^key[i%len(key)]
    return ciphertext

06. Break repeating-key XOR

This challenge requires us to recover the plaintext from a ciphertext obtained using a repeating-key XOR encryption. The method to crack this kind of encryption is explained in the exercise, let's briefly go through it here:

Given the plaintext $P=[p_0, p_1, .., p_n]$ the key $K=[k_0, k_1, .., k_m]$ and assuming that the plaintext is much longer than the key $m \lt n$, the resulting ciphertext $C=[c_0, c_1, .., c_n]$ is computed as:

$$ C_i = E_k(P_i) = (P_i \oplus K_{(i \mod m+1)}) $$

Just to picture it:

p0-p1-p2-p3-p4-p5-p6-p7-p8-p9
k0-k1-k2-k0-k1-k2-k0-k1-k2-k0
-----------------------------
c0-c1-c2-c3-c4-c5-c6-c7-c8-c9

If we find the keylength $m$ we can treat this problem to the single-byte XOR exercise, since every $m$ characters we encrypt the plaintext with the same value. To find the keylenght it is suggested to try different keysize and pick the ones that will lead to a lower hamming distance between the blocks in the ciphertext.

def hamming_distance(buf01: bytearray, buf02: bytearray) -> int:
    """hamming_distance: compute the hamming distance between two strings of
    bytes returning the number of different bits.

    :param buf01:
    :type buf01: bytearray
    :param buf02:
    :type buf02: bytearray

    :rtype: int
    """
    # Check if the buffers are the same length
    if len(buf01) != len(buf02):
        raise ValueError
    # XOR the two buffers, the result will have the bits set in the position
    # where the two buffers are different.
    xored = bytearray([buf01[i]^buf02[i] for i in range(len(buf01))])
    xored = int.from_bytes(xored, 'big', signed=False)

    # Count the set bits.
    count = 0
    while xored:
        xored &= xored-1
        count += 1

    return count

Once we guessed the keylenght we can proceed to work on the ciphertext to treat the problem as a single-byte XOR encryption:

    # Get the most likely KEYLENGHTs
    score = find_keylenght(rxor_enc)
    # Try the first three more likely keysizes
    for keysize in [score[i][0] for i in range(3)]:
        blocks[keysize] = {}
        # For each keysize get all the ciphertext value that are encrypted with
        # the same value and store them in different sets
        for i in range(len(rxor_enc)):
            # initialise the list if it doesn't exist
            blocks[keysize][i%keysize] = blocks[keysize].get(i%keysize, list())
            # store the ciphertext value
            blocks[keysize][i%keysize].append(rxor_enc[i])

    guessed_keys = []
    # Solve for each keylenght
    for k,v in blocks.items():
        print('[+] KeyLength {0} - guessed key: '.format(k))
        key = bytearray()
        # Treat each sets as a single xor problem
        for e,a in v.items():
            key.append(ord(single_xor_guess(bytearray(a))[0][2]))
        print(key.decode())
        guessed_keys.append(key)

    # Print the plaintext
    print("\n[+] Using the most likely key to decrypt:\n")
    key = guessed_keys[0]*(len(rxor_enc))
    plaintext = fixed_xor(rxor_enc, key[:len(rxor_enc)])
    print(plaintext.decode())

We find that the key used was: Terminator X: Bring the noise and the plaintext are the lyrics of Vanilla Ice - Play That Funky Music.

07. AES in ECB mode

For this challenge we are asked to decrypt a file that has been encrypted using the block-cipher AES-128 in ECB (Electronic CodeBook). For this task I chose to use the pyca/cryptography module but there are other options available. The code is pretty straightforward:

from cryptography.hazmat.primitives.ciphers import Cipher
from cryptography.hazmat.primitives.ciphers.algorithms import AES
from cryptography.hazmat.primitives.ciphers.modes import ECB
from cryptography.hazmat.backends import default_backend


def decode_AES_128_ECB(ciphertext: bytes, key: bytes) -> bytes:
    """decode_AES_128_ECB

    :param ciphertext:
    :type ciphertext: bytes
    :param key:
    :type key: bytes

    :rtype: bytes
    """
    backend = default_backend()
    cipher = Cipher(AES(key), ECB(), backend=backend)

    decryptor = cipher.decryptor()
    plaintext = decryptor.update(ciphertext) + decryptor.finalize()
    return plaintext

The plaintext are the lyrics of "Vanilla Ice - Play that funky music", you can verify your result with OpenSSL with this command:

$ openssl aes-128-ecb -K $(echo -n "YELLOW SUBMARINE" | xxd -p | head -c -1) -nosalt -d -in 7.txt -a -nopad

08. Detect AES in ECB mode

For the last challenge of the first set we are given a file containing quite a few hex-encoded ciphertexts and we have to find out which one has been encrypted using ECB. The problem with ECB is that, as it happened in the repeating XOR, if same plaintexts align with the key, the resulting ciphertext will be the same. This property allows us to guess wich one of the ciphertexts is encrypted using ECB by looking for the ones containing repeating blocks.

def find_repeating_blocks(buff: bytes ,bsize: int=16) -> list:
    blen = len(buff)
    bnum = blen//bsize
    blocks = {}
    for i in range(0, bnum*bsize, bsize):
        blocks[buff[i:i+bsize]] = blocks.get(buff[i:i+bsize], 0) + 1
    return blocks

We discover that the only ciphertext containing repeating block is: d880619740a8a19b7840a8[...]933f2c123c58386b06fba186a found at line 133.