*This is the third article in Operation Bellflower’s weekly tech posts series. Check out the previous ones here!*

Howdy and welcome to another installment of Adventures in the Ethornell Engine! Today’s article will be a direct continuation of last week’s, where we did the first half of the work for extracting a scenario file. If you followed the Python approach and everything went well, you should now have at hand an extracted scene file (if not, here’s a copy). It’s still not quite in a usable state though, there’s one last layer of abstraction called the DSC format that we must decipher before we can see the actual contents. Once we take care of that we’ll finally be able to view and edit the game’s text, but that’s spoilers for next week.

A heads up that this article will be quite meaty, so I hope you’re not allergic to Python. Otherwise, feel free to follow the GARbro approach outlined in the previous article to extract your file and skip this. Again, what I will explain here is nothing that GARbro doesn’t already do, but where’s the fun in letting a program do the work for us? So let’s see how to decrypt the DSC format using a Python script.

# What is the DSC format?

The format is used for all text-based assets in the ARC files, including system scripts and scenario files. We’ll be dealing with a compression technique called Huffman coding. It’s quite a clever encoding, and I assure you we’re going to have a lot of fun implementing its decompression algorithm. The key to reading Huffman encoded data is a binary tree (which I’ll call the Huffman tree) that you will find at the beginning of your DSC files after a couple of metadata entries. To make things more ~~painful~~ interesting, this tree is encrypted. Thankfully this encryption has been cracked by many open-source BGI extractors, including GARbro, which we will use as an inspiration for this part. After that tree you will find the data proper.

To summarize, the program we’re going to write today will have these four main steps:

- Reading the file’s metadata
- Decrypting the Huffman tree
- Building the Huffman tree from its decrypted binary representation
- Using the Huffman tree to decompress the file’s data

# Scaffolding the program

If you already followed last week’s ARC decompression tutorial in Python, your setup should be good to go. The only extra thing you’ll need is the numpy library, so make sure to install it with a `pip install numpy`

in your terminal if you don’t have it already.

Let’s start by creating a Python class. As the algorithm is a tad complex this will make things easier to reason with. Don’t worry just yet about these values defined in the constructor, we’ll need them only for later.

```
import io
import numpy # if missing: run "pip install numpy" in a terminal
numpy.seterr(all="ignore") # silence overflow warnings
class DscDecrypter():
def __init__(self, data: io.BufferedReader) -> None:
self.data = data
# will be needed later to read the data bit by bit
self.next_bits: list[int] = []
# will be needed for the decryption part
self.hash = 0
def decrypt(self, output_path: str) -> None:
print("Not implemented yet!")
# be sure to change "hat010010" if you're using a different file
with open("hat010010", "rb") as input_file:
decripter = DscDecrypter(input_file)
# this will be the name of our output file
decripter.decrypt("hat010010_output")
```

# Reading the metadata

Similarly to last week’s ARC format, all DSC files start with the text `DSC FORMAT 1.00`

. We want to first make sure that this is the case. Next up in the file is a 4 bytes integer called the hash. We’ll need it in just a bit for the decryption, but for now let’s store it in our class. Then you’ll find a number that indicates how big the final file should be. The last 8 bytes of the metadata section won’t be needed for the rest, so we’re going to just discard them. Here’s what the beginning of your `decrypt`

function should look like (if you prefer to see the full program, I’ll put it right at the bottom).

```
# ensure that the input has the correct header
header = self.data.read(0x10)
if header.decode() != "DSC FORMAT 1.00\x00":
print("File is not a valid DSC file, aborting")
return
# read metadata at the beginning of the file
self.hash = int.from_bytes(self.data.read(4), "little")
decrypted_size = int.from_bytes(self.data.read(4), "little")
self.data.read(8) # discard value
```

# Decrypting the Huffman tree

We are now getting to the fun part: the next 512 bytes in our file are encrypted. “Encryption” is one of those complicated words that actually have a very simple meaning: it’s just the process of transforming data according to predefined rules. Have you played this game as a kid where you would write secret messages by replacing each letter with the one following it in the alphabet? If so congratulations, you’re already familiar with encryption. Obviously, computers use slightly more complex encryption algorithms. They often involve multiplying the data to encode by very large numbers, preferably prime ones (the reason for this is that prime numbers are hard to find, making it unreasonably complex for would-be attackers to try a brute-force approach to find the original data).

DSC decryption works as follows: you take the hash found in the file’s metadata, cut it in two and multiply each half by the prime number 20021. Take the left half, add 346 times the original hash, then add back the right half of the hash you got from the first step. You obtain a number that when subtracted from your encrypted byte gives you back its unencrypted value.

If this sounds overly complicated, that’s because it’s mean to be. I’m not exactly sure how this algorithm was found, I can only assume it was a feat of reverse-engineering from real h4x0rs who, unlike me, actually knew what they were doing, and without their work we would have been stuck right here. So I do recommend checking out some open-source implementation of this decryption algorithm, for instance here’s the one in GARbro’s C# source. And here it is in Python, to be added as a member function of our `DscDecrypter`

class:

```
def decrypt_next(self, encrypted: int) -> int:
# We use numpy integers to simulate the overflow behavior of C integers,
# which is needed for the algorithm to work properly
new_hash = numpy.uint32(self.hash)
edx = numpy.uint32(20021) * numpy.uint16(new_hash)
eax = numpy.uint32(20021) * numpy.uint16(new_hash >> 16) + \
numpy.uint32(346) * new_hash + numpy.uint16(edx >> 16)
new_hash = numpy.uint32(numpy.uint16(
eax) << 16) + numpy.uint16(edx) + 1
self.hash = int(new_hash)
return int(numpy.uint8(encrypted) - numpy.uint8(eax))
```

With that, we can finally start reading the Huffman tree. It contains 512 leaf nodes, and is encoded as follows: the i-th byte you read corresponds (after decryption) to the depth of leaf node number i in the tree. For now, let’s just store that data in a dictionary. Go back to the decrypt function and add the following snippet:

```
# indices are depths, values are the list of leaf nodes (by their values) at that depth
leaf_nodes_by_depth: dict[int, list[int]] = {}
# read encrypted Huffman tree
for i in range(512):
# "i" is a character code, "depth" is the depth of the character's corresponding
# leaf node in the Huffman tree
encrypted = int.from_bytes(self.data.read(1), "little")
depth = self.decrypt_next(encrypted)
# a depth of 0 means that the character doesn't appear in the decrypted file
if depth != 0:
print("Node {} is at depth {}".format(i, depth))
if depth not in leaf_nodes_by_depth:
leaf_nodes_by_depth[depth] = []
leaf_nodes_by_depth[depth].append(i)
```

# Building the Huffman tree

We have successfully retrieved the tree’s raw data, but now we need to convert that to a better data structure so we can more easily traverse it in the next section. Let’s define a Node class at the top of the file to represent each node in our binary tree:

```
class Node():
def __init__(self) -> None:
self.value: int = None
self.children: list[Node] = None
```

From the data we’ve decrypted and stored in the previous section, we know which leaf nodes the tree should contain, at which depth and in which order. That’s all we need to build our binary tree. Here we’re going for a depth-first type of algorithm, where we keep building down the tree and stop when we find leaf nodes to be put at that specific depth. Let’s add this to the `decrypt`

function:

```
max_depth = max(leaf_nodes_by_depth.keys())
def build_children(node: Node, depth: int):
# if there is no more leaf to build in the deepest layer, we're done
if len(leaf_nodes_by_depth[max_depth]) == 0:
return
node.children = [Node(), Node()]
for child in node.children:
# make the child a leaf node, if possible
if depth + 1 in leaf_nodes_by_depth and len(leaf_nodes_by_depth[depth + 1]) > 0:
# pop(0) takes and removes the first element in a list
child.value = leaf_nodes_by_depth[depth + 1].pop(0)
else: # otherwise, keep building down
build_children(child, depth + 1)
tree_root = Node()
build_children(tree_root, 0)
```

# Decompressing the data

All the bytes left represent the final file’s data in its compressed form. We’ve got our Huffman tree ready to go, and so we now have all we need to start creating the output file. So sit tight while I attempt to explain how to decode Huffman-compressed data (though I’m sure the fancy drawings on the Wikipedia page will do a better job than me).

Here’s how it works: start at the root of your Huffman tree. Then take your compressed data, its raw zeroes and one, and read it bit by bit. If you find a 0, go left in the tree, if it’s a 1, go right. Keep going until you reach a leaf node: read its value, and that’s the very first byte of your final, uncompressed data. Start again from the top of the tree, and keep going until you’re out of bits to read.

There’s one slight complication, though. The above is true only if the decompressed value you obtain is 255 or below. Numbers between 256 and 512 have a special meaning: you read the next 12 bits as a number, then look back through the data decrypted so far by that amount of bytes. Copy the bytes you find there a number of times equal to the original value you just read minus 254. The purpose of this weird behavior is to make compression a bit more efficient, especially when there are regions where the same sequences of bytes are repeated multiple times.

So let’s get back to our Python code. First we’ll need a way to read bits one by one, as Python can only read them by batches of eight. Here’s a function that can do that:

```
# read the next "count" bits as an unsigned integer
def read_bit(self, count: int) -> int:
while len(self.next_bits) < count:
# not enough bits to read: load one byte from the input and add its 8 bits to next_bits
current_byte = int.from_bytes(self.data.read(1), "little")
self.next_bits += [(current_byte >> i) &
1 for i in range(7, -1, -1)]
bits = self.next_bits[:count] # bits as an array of 0s and 1s
# keep the bits we didn't need for next time
self.next_bits = self.next_bits[count:]
# converts the list of 0 and 1 to the integer that it represents
return sum(bit << i for (i, bit) in enumerate(reversed(bits)))
```

And finally we can finish our `decrypt`

function with the decompression proper and create the output file:

```
output: list[int] = []
while len(output) < decrypted_size:
node = tree_root
while node.children is not None:
# picks the first child if 0, second one if 1
node = node.children[self.read_bit(1)]
if node.value >= 0x100: # "region copy"
count = (node.value & 0xFF) + 2
offset = self.read_bit(12) + 2
for _ in range(count):
output.append(output[-offset])
else: # simple copy
output.append(node.value)
with open(output_path, "wb") as output_file:
output_file.write(bytes(output))
```

# Final program

And we are done! The complete program is included below. If you run it and nothing explodes, congratulations! You have just fully extracted and decrypted a game file.

```
import io
import numpy # if missing: pip install numpy
numpy.seterr(all="ignore") # silence overflow warnings
class Node():
def __init__(self) -> None:
self.value: int = None
self.children: list[Node] = None
class DscDecrypter():
def __init__(self, data: io.BufferedReader) -> None:
self.data = data
self.next_bits: list[int] = []
self.hash = 0
def decrypt(self, output_path: str) -> None:
# ensure that the input has the correct header
header = self.data.read(0x10)
if header.decode() != "DSC FORMAT 1.00\x00":
print("File is not a valid DSC file, aborting")
return
# read metadata at the beginning of the file
self.hash = int.from_bytes(self.data.read(4), "little")
decrypted_size = int.from_bytes(self.data.read(4), "little")
self.data.read(8) # discard value
# indices are depths, values are the list of leaf nodes (by their values) at that depth
leaf_nodes_by_depth: dict[int, list[int]] = {}
# read encrypted Huffman tree
for i in range(512):
# "i" is a character code, "depth" is the depth of the character's corresponding
# leaf node in the Huffman tree
encrypted = int.from_bytes(self.data.read(1), "little")
depth = self.decrypt_next(encrypted)
# a depth of 0 means that the character doesn't appear in the final decrypted file
if depth != 0:
print(f"Node {i} is at depth {depth}")
if depth not in leaf_nodes_by_depth:
leaf_nodes_by_depth[depth] = []
leaf_nodes_by_depth[depth].append(i)
# build Huffman tree
max_depth = max(leaf_nodes_by_depth.keys())
def build_children(node: Node, depth: int):
# if there is no more leaf to build in the deepest layer, we're done
if len(leaf_nodes_by_depth[max_depth]) == 0:
return
node.children = [Node(), Node()]
for child in node.children:
# make the child a leaf node, if possible
if depth + 1 in leaf_nodes_by_depth and len(leaf_nodes_by_depth[depth + 1]) > 0:
# pop(0) takes and removes the first element in a list
child.value = leaf_nodes_by_depth[depth + 1].pop(0)
else: # otherwise, keep building down
build_children(child, depth + 1)
tree_root = Node()
build_children(tree_root, 0)
# read file data and decompress using the Huffman tree
output: list[int] = []
while len(output) < decrypted_size:
node = tree_root
while node.children is not None:
# picks the first child if 0, second one if 1
node = node.children[self.read_bit(1)]
if node.value >= 0x100: # "region copy"
count = (node.value & 0xFF) + 2
offset = self.read_bit(12) + 2
for _ in range(count):
output.append(output[-offset])
else: # simple copy
output.append(node.value)
with open(output_path, "wb") as output_file:
output_file.write(bytes(output))
def decrypt_next(self, encrypted: int) -> int:
# we use numpy integers to simulate the overflow behavior of C integers,
# which is needed for the algorithm to work properly
new_hash = numpy.uint32(self.hash)
edx = numpy.uint32(20021) * numpy.uint16(new_hash)
eax = numpy.uint32(20021) * numpy.uint16(new_hash >> 16) + \
numpy.uint32(346) * new_hash + numpy.uint16(edx >> 16)
new_hash = numpy.uint32(numpy.uint16(
eax) << 16) + numpy.uint16(edx) + 1
self.hash = int(new_hash)
return int(numpy.uint8(encrypted) - numpy.uint8(eax))
# read the next "count" bits as an unsigned integer
def read_bit(self, count: int) -> int:
while len(self.next_bits) < count:
# not enough bits to read: load one byte from the input and add its 8 bits to next_bits
current_byte = int.from_bytes(self.data.read(1), "little")
self.next_bits += [(current_byte >> i) &
1 for i in range(7, -1, -1)]
bits = self.next_bits[:count] # bits as an array of 0s and 1s
# keep the bits we didn't need for next time
self.next_bits = self.next_bits[count:]
# converts the list of 0 and 1 to the integer that it represents
return sum(bit << i for (i, bit) in enumerate(reversed(bits)))
with open("hat010010", "rb") as input_file:
decripter = DscDecrypter(input_file)
decripter.decrypt("hat010010_output")
```

# How about compressing back to DSC?

Technically, it’s feasible. You could build a Huffman tree yourself and use it to compress back the data as easily as it can be decompressed (you likely can’t reuse the one in the original file, as you might need to encode more characters in it). You’d have to encrypt it, but thankfully the decryption algorithm can be easily reversed (just replace the minus by a plus at the end of `decrypt_next`

and you’re good to go).

In practice, though? I have tried and miserably failed. Maybe the game expects the file size to be within a specific margin. Or maybe the initial hash can’t be picked at random. Or maybe those eight bits we discarded in the header actually have a very important meaning. Who knows, in any case I didn’t spend too much time on this because again, the engine has no trouble using decompressed files directly. But if you try it and succeed, do let me know!

If you are really hellbent on compressing back your data, check out this link. The reason why I’m sharing this tool from a 2013 thread in some obscure Chinese forum is that it’s the only thing I could find in the whole world wide web that is capable of successfully encrypting files back to the DSC (and even ARC) format. Unfortunately the source code isn’t included, so we might never know how this was done.

# What’s next?

It’s been a doozy, but we are now finished with all the extraction, decompression and decryption work. If you examine the file that the Python code spits out, it should be the same one, bit for bit, as the one you get from GARbro. We’ll finally be able to put our grubby hands on the game’s original text and start modifying it, but that’ll be a story for the next article. There’s been a lot of programming going on, in fact this might have been the most Python-heavy article the series will ever see, but rest assured that starting next week things will calm down on the programming side, and we’ll start doing actual patching for real. So stay tuned, and have fun!

*Any question, comment, concern, or have an open source DSC compression algorithm to share (please I’m desperate), hit me up on Reddit or Discord at Perturbed_pangolin#3792, or shoot me an email.*