CODEGATE 2022 Preliminary : Dark Arts

2022. 2. 28. 19:00

2 Solves in General Division. Author : rkm0959

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193

import os
import hashlib 
import signal 
 
signal.alarm(300)
 
def inner_product(u, v):
    assert len(u) == len(v)
    res = 0
    for a, b in zip(u, v):
        res += a * b
    return res
 
def guess_mode(G):
    while True:
        idx = int(input())
        if idx == 0:
            x = int(input())
            print(G.calc(x))
        elif idx == 1:
            mode = int(input())
            if mode != G.mode:
                exit()
            else:
                break
        else:
            exit()
 
def guess_key(G, l):
    while True:
        idx = int(input())
        if idx == 0:
            x = int(input())
            print(G.func_gen(x))
        elif idx == 1:
            for i in range(l):
                x = int(input())
                if x != G.key[i]:
                    exit()
            break
        else:
            exit()
 
class Generator1:
    def __init__(self):
        seed = int.from_bytes(os.urandom(32), "big")
        self.key = [0] * 256
        for i in range(256):
            self.key[i] = (seed >> i) & 1
        self.mode = os.urandom(1)[0] & 1
        
        self.p = 2
        self.q = 3
 
        self.cache0 = {}
        self.cache1 = {}
        
    def func_gen(self, x):
        assert 0 <= x < (1 << 256)
        if x in self.cache0.keys():
            return self.cache0[x]
        arr = [0] * 256
        for i in range(256):
            arr[i] = (x >> i) & 1
        prod = inner_product(self.key, arr)
        self.cache0[x] = (prod % self.p + prod % self.q) % self.p
        return self.cache0[x]
    
    def func_random(self, x):
        assert 0 <= x < (1 << 256)
        if x in self.cache1.keys():
            return self.cache1[x]
        self.cache1[x] = os.urandom(1)[0] & 1
        return self.cache1[x]
    
    def calc(self, x):
        ret0 = self.func_gen(x)
        ret1 = self.func_random(x)
        if self.mode == 0:
            return ret0
        else:
            return ret1
 
def challenge_generator_1():
    print("Challenge 1")
    for _ in range(64):
        G = Generator1()
        guess_mode(G)
 
class Generator2:
    def __init__(self):
        seed = int.from_bytes(os.urandom(32), "big")
        self.key = [0] * 256
        for i in range(256):
            self.key[i] = (seed >> i) & 1
        self.mode = os.urandom(1)[0] & 1
        
        self.p = 5
        self.q = 7
 
        self.cache0 = {}
        self.cache1 = {}
    
    def func_gen(self, x):
        x = int.from_bytes(hashlib.sha256(str(x).encode()).digest(), "big")
        if x in self.cache0.keys():
            return self.cache0[x]
        hashed = [0] * 256
        for i in range(256):
            hashed[i] = (x >> i) & 1
        prod = inner_product(self.key, hashed)
        self.cache0[x] = (prod % self.p + prod % self.q) % self.p
        return self.cache0[x]
    
    def func_random(self, x):
        x = int.from_bytes(hashlib.sha256(str(x).encode()).digest(), "big")
        if x in self.cache1.keys():
            return self.cache1[x]
        self.cache1[x] = int.from_bytes(os.urandom(32), "big") % self.p
        return self.cache1[x]
    
    def calc(self, x):
        ret0 = self.func_gen(x)
        ret1 = self.func_random(x)
        if self.mode == 0:
            return ret0
        else:
            return ret1
 
def challenge_generator_2():
    print("Challenge 2")
    for _ in range(64):
        G = Generator2()
        guess_mode(G)
 
class Generator3:
    def __init__(self):
        seed = int.from_bytes(os.urandom(16), "big")
        self.key = [0] * 64
        for i in range(64):
            self.key[i] = seed & 3
            seed = seed >> 2
 
        self.p = 2
        self.q = 5
    
    def func_gen(self, x):
        x = int.from_bytes(hashlib.sha256(str(x).encode()).digest(), "big")
        hashed = [0] * 64
        for i in range(64):
            hashed[i] = x % self.q
            x = x // self.q
        prod = inner_product(self.key, hashed)
        return (prod % self.q) % self.p
 
def challenge_generator_3():
    print("Challenge 3")
    G = Generator3()
    guess_key(G, 64)
 
class Generator4:
    def __init__(self):
        self.key = [0] * 16
        for i in range(16):
            self.key[i] = int.from_bytes(os.urandom(32), "big")
 
        self.p = int.from_bytes(os.urandom(32), "big") + (1 << 256)
        self.q = int.from_bytes(os.urandom(16), "big") + (1 << 128)
        
        print(self.p)
        print(self.q)
    
    def func_gen(self, x):
        x = hashlib.sha256(str(x).encode()).digest()
        hashed = []
        for _ in range(16):
            hashed.append(int.from_bytes(x, "big"))
            x = hashlib.sha256(x).digest()
        prod = inner_product(self.key, hashed)
        return (prod % self.p + prod % self.q) % self.p
 
def challenge_generator_4():
    print("Challenge 4")
    G = Generator4()
    guess_key(G, 16)
 
challenge_generator_1()
challenge_generator_2()
challenge_generator_3()
challenge_generator_4()
 
flag = open("flag", "r").read()
print(flag)
Colored by Color Scripter

cs

Solution

There are four independent challenges that needs to be solved within 300 seconds.

The flavor of the first two challenges and the last two challenges are a bit different.

For the first two challenges, there is fixed a secret key $k$ and a pseudorandom function $F_k(x)$ which outputs a "random" value in $\mathbb{F}_p$.

We have to distinguish this pseudorandom function with an actual random function. To do this, we can choose our inputs $x$ and then the server will give either $F_k(x)$ or a random value. We have to pass this distinguish test 64 times for each of the two challenges.

For the last two challenges, there is a fixed secret key $k$ and a pseudorandom function $F_k(x)$ which outputs a "random" value in $\mathbb{F}_p$.

We have to recover the secret key $k$. To do this, we can choose our inputs $x$ and the server will give $F_k(x)$.

Generator 1

The pseudorandom function is $$F_k(x) = \left( \langle k, x \rangle \pmod{2} + \langle k, x \rangle \pmod{3} \right) \pmod{2}$$ with $k \in \{0, 1\}^{256}$ and $x \in \{0, 1\}^{256}$. We can choose whatever $x$ we want and get either $F_k(x)$ or some random $\mathbb{F}_2$ element.

The most natural distinguishment we get is that $F_k(0) = 0$ regardless of $k$. If we query the output at $x = 0$ and get $1$ as the result, we can immediately conclude that our function is an actual random function. Are there any more inputs like this?

It turns out there are. If $x$ is a unit vector $e_i$ with $i$th coordinate $1$ and others $0$, then we can easily show that $F_k(x)$ must be $0$ regardless of $k$. Now the solution is straightforward - take $60$ of these $x$ values and send them to the server. If all return values are $0$, the function is extremely likely to be a pseudorandom function $F_k(x)$. If at least one return value is $1$, the function must be an actual random function.

Generator 2

This time, the pseudorandom function is $$F_k(x) = \left( \langle k, x \rangle \pmod{5} + \langle k, x \rangle \pmod{7} \right) \pmod{5}$$ with $k \in \{0, 1\}^{256}$ and $x \in \{0, 1\}^{256}$. The problem is, now the input $x$ is SHA256 hashed first then encoded into $\{0, 1\}^{256}$.

Therefore, we can't really choose what value of $x$ to use - practically, we can only use random inputs to our mysterious function.

The key idea is rather hard to find because it is surprisingly simple. The idea is that the output distribution of $F_k(x)$ is not uniform over $\mathbb{F}_5$. This can be either tested with some random trials with code or computed with some basic combinatorics, also with code.

The details of both methods are not hard, so I'll not go into the details here. In practice, if you send $6000$ random inputs, we can determine whether the function is $F_k(x)$ or actually random function by checking if the most frequent number appeared more than $1310$ times.

Computing the success probability relatively precisely left to the readers as exercise - this should not be difficult albeit a bit tedious.

Generator 3

We now have to find the secret key $k$. The pseudorandom function is $$F_k(x) = \left( \langle k, x \rangle \pmod{5} \right) \pmod{2}$$ with $k \in \{0, 1, 2, 3\}^{64}$ and $x \in \{0, 1, 2, 3, 4\}^{64}$. The input $x$ is also SHA256 hashed.

The key idea is linearization. If $F_k(x) = 1$, this implies that $\langle k, x \rangle \pmod{5}$ is either $1$ or $3$ - so $$ \left( \langle k, x \rangle - 1 \right) \left( \langle k, x \rangle - 3 \right) \equiv 0 \pmod{5}$$ Since we know the value of $x$, this gives a quadratic equation on $k$'s each coordinate values. We collect a few thousand such equations.

Now, this can be solved by considering each monomials of degree at most 2 as independent variables, and solving the linear equation.

Generator 4

This time, $p$ and $q$ are very large - $p$ is 256 bits and $q$ is 128 bits. The pseudorandom function is $$F_k(x) = \left( \langle k, H(x) \rangle \pmod{p} + \langle k, H(x) \rangle \pmod{q} \right) \pmod{p}$$ where $k \in \{0, 1, \cdots , 2^{256}-1 \}^{16}$ and $H(x) = (h^1(x), h^2(x), \cdots, h^{16}(x))$ where $h^n(x)$ is $x$ after SHA256 applied $n$ times then converted into an integer. Note that we once again have no essental control over $H(x)$, so it's practically a random vector.

The idea is that $p$ is much larger than $q$, and $\langle k, H(x) \rangle \pmod{q}$ is between $0$ and $q$. Therefore, the result $F_k(x)$ gives us a range of length $q$ which the value $\langle k, H(x) \rangle \pmod{p}$ must lie. Since we can compute $H(x)$, each input $x$ gives us a precise range for a random linear combination of $k$'s values in mod $p$. Now this reduces to a simple lattice attack with CVP, and my repository is strong enough to solve this challenge. For details, check out https://github.com/rkm0959/Inequality_Solving_with_CVP, or Samsung Software Membership blog.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226

import os
import hashlib 
import time
from tqdm import tqdm
from sage.all import *
from sage.modules.free_module_integer import IntegerLattice
from pwn import *
 
conn = remote("127.0.0.1", 9003)
 
def Babai_CVP(mat, target):
    M = IntegerLattice(mat, lll_reduce=True).reduced_basis
    G = M.gram_schmidt()[0]
    diff = target
    for i in reversed(range(G.nrows())):
        diff -=  M[i] * ((diff * G[i]) / (G[i] * G[i])).round()
    return target - diff
 
def solve(mat, lb, ub, weight = None):
    num_var  = mat.nrows()
    num_ineq = mat.ncols()
 
    max_element = 0 
    for i in range(num_var):
        for j in range(num_ineq):
            max_element = max(max_element, abs(mat[i, j]))
 
    if weight == None:
        weight = num_ineq * max_element
 
    # sanity checker
    if len(lb) != num_ineq:
        print("Fail: len(lb) != num_ineq")
        return
 
    if len(ub) != num_ineq:
        print("Fail: len(ub) != num_ineq")
        return
 
    for i in range(num_ineq):
        if lb[i] > ub[i]:
            print("Fail: lb[i] > ub[i] at index", i)
            return
 
    # heuristic for number of solutions
    DET = 0
 
    if num_var == num_ineq:
        DET = abs(mat.det())
        num_sol = 1
        for i in range(num_ineq):
            num_sol *= (ub[i] - lb[i])
        if DET == 0:
            print("Zero Determinant")
        else:
            num_sol //= DET
            # + 1 added in for the sake of not making it zero...
            print("Expected Number of Solutions : ", num_sol + 1)
 
    # scaling process begins
    max_diff = max([ub[i] - lb[i] for i in range(num_ineq)])
    applied_weights = []
 
    for i in range(num_ineq):
        ineq_weight = weight if lb[i] == ub[i] else max_diff // (ub[i] - lb[i])
        applied_weights.append(ineq_weight)
        for j in range(num_var):
            mat[j, i] *= ineq_weight
        lb[i] *= ineq_weight
        ub[i] *= ineq_weight
 
    # Solve CVP
    target = vector([(lb[i] + ub[i]) // 2 for i in range(num_ineq)])
    result = Babai_CVP(mat, target)
 
    for i in range(num_ineq):
        if (lb[i] <= result[i] <= ub[i]) == False:
            print("Fail : inequality does not hold after solving")
            break
    
    # recover x
    fin = None
 
    if DET != 0:
        mat = mat.transpose()
        fin = mat.solve_right(result)
    
    ## recover your result
    return result, applied_weights, fin
 
def inner_product(u, v):
    assert len(u) == len(v)
    res = 0
    for a, b in zip(u, v):
        res += a * b
    return res
 
def solve_generator_1():
    for _ in tqdm(range(64)):
        lines = []
        for i in range(20):
            lines.append(b"0")
            lines.append(str(1 << i).encode())
        conn.sendlines(lines)
        res = conn.recvlines(20)
        conn.sendline(b"1")
        if b"1" in res:
            conn.sendline(b"1")
        else:
            conn.sendline(b"0")
 
def solve_generator_2():
    for _ in tqdm(range(64)):
        cnt = [0] * 5        
        lines = []
        for i in range(0, 6000):
            lines.append(b"0")
            lines.append(str(i).encode())
        conn.sendlines(lines)
        res = conn.recvlines(6000)
        for i in range(0, 6000):
            cnt[int(res[i])] += 1
        conn.sendline(b"1")
        if max(cnt) > 1310:
            conn.sendline(b"0")
        else:
            conn.sendline(b"1")
  
def solve_generator_3():
    idx = []
    for i in range(64):
        idx.append([0] * 64)
    cur = 64
    for i in range(64):
        for j in range(i, 64):
            idx[i][j] = cur
            cur += 1
    M = []
    target = []
    lines = []
    for i in range(6000):
        lines.append(b"0")
        lines.append(str(i).encode())
    conn.sendlines(lines)
    res = conn.recvlines(6000)
    for i in tqdm(range(6000)):
        x = int.from_bytes(hashlib.sha256(str(i).encode()).digest(), "big")
        hashed = [0] * 64
        for j in range(64):
            hashed[j] = x % 5
            x = x // 5
        if int(res[i]) == 0:
            continue
        vec = [0] * 2144
        for j in range(64):
            vec[j] += hashed[j]
            vec[idx[j][j]] += hashed[j] ** 2
            for k in range(j+1, 64):
                vec[idx[j][k]] += 2 * hashed[j] * hashed[k]
        M.append(vec)
        target.append(2)
    M = Matrix(GF(5), M)
    target = vector(GF(5), target)
    res = M.solve_right(target)
 
    conn.sendline(b"1")
    for i in range(64):
        conn.sendline(str(res[i]).encode())   
 
def solve_generator_4():
    p = int(conn.recvline())
    q = int(conn.recvline())
    DATA = 33
    LEN = 16
    M = Matrix(ZZ, DATA + LEN, DATA + LEN)
    lb = [0] * (DATA + LEN)
    ub = [0] * (DATA + LEN)
    lines = []
    for i in range(DATA):
        lines.append(b"0")
        lines.append(str(i).encode())
    conn.sendlines(lines)
    res = conn.recvlines(DATA)
    for i in range(DATA):
        x = hashlib.sha256(str(i).encode()).digest()
        hashed = []
        for _ in range(LEN):
            hashed.append(int.from_bytes(x, "big"))
            x = hashlib.sha256(x).digest()
        result = int(res[i])
        lb[i] = result - q
        ub[i] = result
        for j in range(LEN):
            M[j, i] = hashed[j]
        M[i + LEN, i] = p
    for i in range(LEN):
        M[i, i + DATA] = 1
        lb[i + DATA] = 0
        ub[i + DATA] = p
    _, _, fin = solve(M, lb, ub)
    conn.sendline(b"1")
    for i in range(LEN):
        conn.sendline(str(fin[i]).encode())
    
st = time.time()
 
print(conn.recvline())
solve_generator_1()
print("check1")
 
print(conn.recvline())
solve_generator_2()
print("check2")
 
print(conn.recvline())
solve_generator_3()
print("check3")
 
print(conn.recvline())
solve_generator_4()
print("check4")
 
en = time.time()
 
print("time :", en - st)
print(conn.recvline())
Colored by Color Scripter

cs

Background

A PRF takes a key $k$ and an input $x$ and deterministically outputs $y = F(k, x)$.

A PRF is secure if no efficient adversaries can distinguish (with non-negligible probability) between $F$ and a truly random function between the input/output space. Here, the adversary may query any $x$ and read the output which is either $F(k, x)$ or a random value.

If an adversary who can only query random $x$ in the domain cannot distinguish (with non-negligible probability) between $F$ and a truly random function, the PRF $f$ is called weakly secure PRF, or weak PRF. So Generator 2, 3, 4 are all about (something similar to) weak PRFs.

PRFs and weak PRFs are building blocks of cryptography - Boneh & Shoup's book has an interesting exercise regarding them (18.16), and there are plenty of papers on them as well. The PRFs of the form given in the challenge, i.e. $$F(k, x) = \left( \langle k, x \rangle \pmod{p} + \langle k, x \rangle \pmod{q} \right) \pmod{p}$$ comes from the paper "Exploring Crypto Dark Matter: New Simple PRF Candidates and Their Applications" by Boneh, Ishai, and Passel`egue. This paper was in TCC 2018. The main idea was that these functions are relatively simple to calculate via secure multiparty computation - but since I'm not very knowledgeable in this area yet I cannot explain more details.

Some attacks on this PRF is noted in the TCC 2018 paper - in page 7, where they mention the PRF with $p=2$ and $q=3$, they also note that in a constant-modulus regime the fact that the modulus is composite is important to prevent a direct linearization attack. Generator 3 covers this idea. They also note that lattice style attacks are possible (BKW) which is done in Generator 4.

More details are found on page 36, where they outline more attacks. Remark 6.4 explains once again about possible lattice attacks (Generator 4) and Remark 6.5 is once again on linearization attacks (Generator 3). At the start of page 37, there is a note that there is construction is not a PRF - that vectors with Hamming weight 1 is enough for distinguishment. Finding this idea was given as a challenge via Generator 1.

After a few years, Cheon et al (who taught me cryptography in univ, thanks) wrote about an statistical attack on this PRF in their paper "Adventures in Crypto Dark Matter: Attacks and Fixes for Weak Pseudorandom Functions". This paper was in PKC 2021. Their main idea was simple but very effective - the output distribution of the said PRF was surprisingly far from the uniform distribution. Their calculations are outlined on Section 4, starting at page 8 of the paper. Their calculations are on the PRF with the setting $p=2$ and $q=3$, but the result for arbitrary primes $p, q$ are on Remark 4.4, at the top of page 12. Reading this paper gave me the motivation for setting up this challenge.

Comments

Note the cache is there to prevent same inputs giving different outputs in the case an actual random function is used.
The server computes both $F_k(x)$ and the actual random function then returns the output to prevent timing attacks.
Generator 1 is intended to be a soft introduction to the challenge, letting participants get a sense of what's going on.
Generator 3's main idea (linearization) have appeared in multiple CTFs before. I can recall three of them.
When writing the exploit, sending queries in batches is extremely important to fit the time requirements.

저작자표시 비영리

'CTF' 카테고리의 다른 글

WACon 2022 Quals: RSA Secret Sharing (0)	2022.06.27
CODEGATE 2022 Preliminary : Prime-Generator (0)	2022.02.28
SECCON CTF 2021 Writeups (0)	2021.12.14
N1CTF 2021 Writeups (1)	2021.11.22
PBCTF 2021 Writeups (0)	2021.10.13

rkm0959