Sunday, March 1, 2015

The Inconvenient Message Detection (IMD) prototype is done

Github URL, how to use it is in the ReadMe: https://github.com/DickingAround/InconvenientMessageDetection

The problem:

Who talks to who and who uses encryption is obvious to a mass-observer (e.g. the NSA). Large encryption keys and encrypted data are hard to hide. This leaves the meta-data of communication open for observation. It also leaves people using encryption vulnerable to coercion; an individual in power could demand they decrypt their communications.

The goals:

  • Obscure who talks to who without requiring broad adoption of a new technology.
  • Increase the cost of mass-observation.

Two communicating parties have to transmit something. The smallest message that can be transmitted is a single bit; 'there is a secret'. If it is reasonable to transmit this datum through insinuation, pre-determination, or other meat-space methods, then the meta-data of the communication is hidden.

The solution we've made:

Steganography that requires computing effort. Steganography is the idea of hiding data in the unimportant bits of an image. This is an old idea. The new twist is that with 'Inconvenient Message Detection' (IMD) that data can only be found if the decoder does an amount of computational work that's decided by the encoder.

Let's look at how this solved our problem: If some people use IMD, it makes every image suspect. When a mass-observer wants to see what communication is going on, they must use compute power to check every image. Furthermore, because the compute effort needed to find an image is set by the encoder, it is uncertain to the mass-observer; they never knows for certain if they've worked hard enough. By contrast, the intended recipient of a message presumably got the single datum that a message exists in some public image and will put in as much compute effort as is needed to find the data in that single image. Having every image on the internet be a potential carrier of secrets makes the mass observation of communication meta-data expensive and uncertain.

Furthermore, even a individual under direct observation can increase their protection with IMD. An individual may own thousands of images of which only one contains an secret. Until the secret is found by an observer, the individual has plausible denyability of the secret's existence. The observer may even give up before spending the necessary compute effort to find it. This increases their resistance to coercion.

The current program is only a prototype. I am a competent engineer, but not professional in security, encryption, or steganography. I am more than open to advice and help. (And here's thanks to all those who have already helped craft and refine this idea and implementation so far.)

How does it work, at a high level:

Obfuscation:

  1. The image is hashed to create a number. That number is used to encrypt the file to be obfuscated. This encryption is only done to assure the bits of the file are evenly distributes between 0 and 1, it provides no other protection.
  2. The image is hashed to create another number. That number is hashed a certain number of times as specified by the user. The result is used to seed a pseudo-random number generator.
  3. The pseudo-random number generator is then used to decide where to put the 1s and 0s of the file into the image.
  4. For each bit, there is a check to see if the pixel being changed will give away that the file exists (e.g. a '1' in a field of '0' saturated color). This check also makes sure we haven't changed the results of previous checks or written to this pixel twice.

De-obfuscation:

  1. Same as encryption step 1.
  2. Similar to step 2 except that after each hash, the image is checked to see if there is a file in it. This is continued until it finds a file or a the user stops it.
  3. Same as encryption step 2.

FAQ:

Q: Can anyone read a message hidden with IMD?

A: Yes. IMD only obscures the existence of the message. Feel free to also use encryption to make sure only the intended recipient can read the message if it is discovered.

Q: How are you making it difficult to detect changes in the image?

A: I have a basic algorithm to avoid changes that would be blatant giveaways. This algorithm also serves to show where/how a better algorithm would be implemented (it's tricky to check a pixel and than conditionally change it). The basic algorithm checks for pixels which already are or are near saturated so as to avoid them. Also, pixels that are near others with the same value are not used. I assume there are better algorithms here, but I have not had time to research it.

Q: What about image meta-data? Often images have meta-data about the camera that took them. Is this disturbed? Does it give away the existence of a message?

A: No idea. This is the #1 feature on the list for a V2 of this program.

Q: What about lossy image formats like JPEG? Isn't that the most common image on the internet?

A: Yes, but it's harder to work with. In talking to a steganography expert, I gather it should be possible to store this data in the way JPEGs or other lossy formats are encoded since they are not deterministic. But it's harder so I haven't done it yet. That work would be very valuable. If you would be willing to to that work, please don't hesitate to contact me.

Q: Why are you using python's 'random' library? It's well known that this should not be used for encryption.

A: Most encryption wants/needs a random number generator, not a pseudo-random number generator. For example, os.urandom is a source of as random as possible. We don't want real random. We want a pseudo-random number generator; a random number generator that can be seeded.

In practice the 'random' library in python is an MT19937 with a period of 2^19937 which is to say it is highly impractical/impossible to parallel-test every one of those 2^19937 possible starting positions. Thus, the library works fine for how IMD uses it.

That said, feel free to swap it out for another for your own use.

Q: Why use AES to encrypt the message before encoding it?

A: I picked a symmetric encryption algorithm at random. It's only real job is to even the distribution of bits in the secret file such that there's no pattern of more '1's or '0's in the image.

Feel free to swap it out for another for your own use.

Github URL, how to use it is in the ReadMe: https://github.com/DickingAround/InconvenientMessageDetection

No comments:

Post a Comment