MayADevBe Blog

A Blog about Computer Science

ZW Steg - Zero-Width Characters Steganography Tool

ZW Steg is a python command-line tool implemented to hide secret messages through zero-width Unicode characters. This tool was inspired by Steganographr, 330k and the TryHackMe Room The Impossible Challenge.

The idea of zero-width character stenography is converting a private message into binary data. Then the binary characters space, zero and one are mapped to three different zero-width characters. This will be appended to a public message. The result is that only the public message will be visible.

This method is relatively hard to detect because most text editors display these Unicode characters with zero width - as it’s supposed to. However, there are add-ons for different browsers that transform these characters into emojis (Exp. ZeroWidth Detection by Mikkel D.).

Now let’s take a look at the code:

The Code

Now that we understand what the tool does, we can take a look at how I coded the tool. You can check out the GitHub repository to get the full code.

Zero-Width Characters

First, I collected a list of zero-width characters. I mainly used the list from 330k for this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
possible_zero_width_chars = [
	"\u2060", #WORD JOINER
	"\u200B", #ZERO WIDTH SPACE
	"\u200C", #ZERO WIDTH NON-JOINER
	"\u180E", #MONGOLIAN VOWEL SEPARATOR
	"\u200D", #ZERO WIDTH JOINER
	"\u200E", #LEFT-TO-RIGHT MARK
	"\u200F", #RIGHT-TO-LEFT MARK
	"\uFEFF", #ZERO WIDTH NO-BREAK SPACE
	"\u202A", #LEFT-TO-RIGHT EMBEDDING
	"\u202C", #POP DIRECTIONAL FORMATTING
	"\u202D", #LEFT-TO-RIGHT OVERRIDE
	"\u2062", #INVISIBLE TIMES
	"\u2063"  #INVISIBLE SEPARATOR
]

Encoding

Next, I created the encoding function. This is the function that takes the secret and public message and hides the secret in the public message through these zero-width characters.

For the encoding, I needed a way to map the binary characters to the zero-width characters. I decided to create two lists and map the positions. This could have also been done with dictionaries.

1
2
bin_list = [" ","0","1"]
char_list = ["\u2060", "\u200B", "\u200C"]

The first step is to transform the secret message into binary. In Python, the ord() function transforms a character into an ASCII value/integer. The integer can then be transformed into binary with format(x, 'b'). Each binary representation of a character will be joined with a space.

1
2
3
4
5
6
7
def encode(secret_text, open_text):
	bin_text = ""
	encoded_text = open_text
	bin_text = ' '.join(format(ord(x), 'b') for x in secret_text)
	for b in bin_text:
		encoded_text += char_list[bin_list.index(b)]
	return encoded_text

Finally, we have a binary string that only contains spaces, zeros and ones. These characters can then be mapped to the zero-width characters. bin_list.index(b) returns the position of the current character. The equivalent position in the char_list is the zero-width character that represents the character from the binary string.

The zero-width characters are appended to the end of the public message. This could be varied. The characters could be written before or in-between as well.

Example

Here are the steps shown for an example:

  1. Secret message: Secret Message!
  2. Public message: Public Message
  3. Secret message to integer: 83 101 99 114 101 116 32 77 101 115 115 97 103 101 33
  4. Integer to binary: 1010011 1100101 1100011 1110010 1100101 1110100 100000 1001101 1100101 1110011 1110011 1100001 1100111 1100101 100001
  5. Binary to zero-width characters: \xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8c\xe2\x80\x8c\xe2\x81\xa0\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8c\xe2\x81\xa0\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8c\xe2\x80\x8c\xe2\x81\xa0\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8c\xe2\x80\x8b\xe2\x81\xa0\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8c\xe2\x81\xa0\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8b\xe2\x81\xa0\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8b\xe2\x81\xa0\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8c\xe2\x81\xa0\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8c\xe2\x81\xa0\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8c\xe2\x80\x8c\xe2\x81\xa0\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8c\xe2\x80\x8c\xe2\x81\xa0\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8c\xe2\x81\xa0\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8c\xe2\x81\xa0\xe2\x80\x8c\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8c\xe2\x81\xa0\xe2\x80\x8c\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8b\xe2\x80\x8c
  6. Zero-width characters appended to public message: Public Message‌​‌​​‌‌⁠‌‌​​‌​‌⁠‌‌​​​‌‌⁠‌‌‌​​‌​⁠‌‌​​‌​‌⁠‌‌‌​‌​​⁠‌​​​​​⁠‌​​‌‌​‌⁠‌‌​​‌​‌⁠‌‌‌​​‌‌⁠‌‌‌​​‌‌⁠‌‌​​​​‌⁠‌‌​​‌‌‌⁠‌‌​​‌​‌⁠‌​​​​‌

Decoding

The decoding function works oppositely to the encoding function. First, we only want to look at the zero-width characters. The characters from the public message can be discarded. Then we access the two lists in exactly the opposite order.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def decode(open_text):
	bin_text = ""
	for w in open_text:
		if w in char_list:
			bin_text += bin_list[char_list.index(w)]
	bin_val = bin_text.split()
	secret_text = ""
	for b in bin_val:
		secret_text += chr(int(b, 2))
	return secret_text

The result will be the binary string, each character separated by a space. To transform the binary back into the character, we transform the binary into an integer with int(b,2) and then the ASCII integer into the corresponding character with the char() function. This results in the secret message.

Brute Force Decoding

Since there is a whole list of zero-width characters and the mapping to the binary characters can be done arbitrarily, I decided to implement a simple ‘brute fore’ decoding.

First, the three used zero-width characters have to be determined. It has to be exactly three because there are three binary characters that need to have a corresponding zero-width character.

1
2
3
4
5
6
7
def brute_decode(open_text):
	used_chars = []
	for p in possible_zero_width_chars:
		if p in open_text:
			used_chars.append(p)
	if len(used_chars) != 3:
		return "Cannot decode!"

Next, we need to decode the string for each possible order/permutation of the three zero-width characters. There are six options. I used the itertools library to generate all possible permutations. Some of these permutations could lead to very big binary numbers that are too big to be an integer. In this case, the particular permutation cannot be the original solution and they are simply skipped.

The result of the function will be a maximum of six different strings. It lays by the user to figure out what was the original secret message. Generally, this should be obvious.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
	possible_secret_texts = []
	perms = list(itertools.permutations(used_chars))
	for perm in perms:
		try:
			bin_text = ""
			for w in open_text:
				if w in char_list:
					bin_text += bin_list[perm.index(w)]
			bin_val = bin_text.split()
			secret_text = ""
			for b in bin_val:
				secret_text += chr(int(b, 2))
			possible_secret_texts.append(secret_text)
		except:
			pass
	return possible_secret_texts

List Strings

I implemented a helper function to display the lists with zero-width characters. It is necessary to use .encode() with these characters to see the Unicode representation like "\u2060", since the whole point of these characters is that they are not visible.

1
2
3
4
5
6
7
def string_list(the_list):
	the_string = ""
	i = 1
	for l in the_list:
		the_string += str(i) + ": " + str(l.encode()) + ", "
		i += 1
	return the_string[:-2]

Combine Everything

Finally to make a functional tool, I combined all of the functions above. For this, I decided to create a simple command-line tool. The tool first queries the options based on the choice it will ask for additionally necessary input.

For the encoding function I added that the encoded result will be saved in a file. This makes it easier to copy the message and not lose the zero-width characters. This could have potentially been avoided if I had placed the characters inside the public message instead of behind it.

I also implemented the possibility to change what zero-width characters are used for the encoding and decoding. This allows for more flexibility.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def main():
	...
	choice = 0
	while choice != 5:
		print("Options:")
		print("1. Encode/Hide Text")
		print("2. Decode/Reveal Text")
		print("3. Replace Zero Width Character")
		print("4. 'Brute-Force' Decoding")
		print("5. Exit")
		print()
		choice = int(input("Choice: "))
		print()
		match choice:
			case 1:
				secret_text = input("Text to be encoded: ")
				open_text = input("Text to be shown: ")
				print()
				encoded_text = encode(secret_text, open_text)
				print('"' + encoded_text + '"')
				file = input("Save in file: ")
				with open(file, 'w', encoding="utf-8") as f:
					f.write(encoded_text)
				print("Saved in file: " + file)
			case 2:
				...
			case 3:
				...
				i = int(input("What position you want to change: "))
				j = int(input("Character you choose: "))
				print()
				if (i <= len(bin_list) & j <= len(possible_zero_width_chars)):
					bin_list[i] = possible_zero_width_chars[j]
				print("Updated: " + " ".join(bin_list))
			case 4:
				...
			case 5:
				print("Exiting")
			case _:
				print("Wrong Input!")
		print()
		print()

Conclusion

This was a very fun afternoon project. It made me revise my python skill and I learned something new about stenography, all while creating a tool that could be helpful in future CTF challenges. I hope you enjoyed my breakdown of the project.

I have hidden a secret message in the GitHub repository of ZW Steg. Can you find and encode it?


Share on: