Home About Me Projects ⇢ cv-image-scramble

Plain Sight: Images for Intended Audiences Only

1.0 Introduction
The challenge of securing digital images is an age-old struggle. Of the many formats of digital media that exist, images are almost undoubtedly the hardest to secure in this day and age. One screenshot, one picture of another screen, one person glancing over your shoulder at your phone for a split second, and the content has been consumed - the privacy compromised immediately.

However, in trying to secure digital images, there is a key concept that is often overlooked: the image audience. What do I mean by image audience? I mean the intended consumers of the photograph in question. Rarely is the image audience the entire public internet, yet it’s now easier than ever to find pictures of someone by simply googling their name.

The crux of the matter is clear. People need a universal way to limit their image audiences. Plain Sight is a proposed solution to this need.

1.1 What is Plain Sight?
Plain Sight is a tool and eventual mobile application that enables end-to-end photo confidentiality through the use of cryptographic scrambling and computer vision. Using Plain Sight, you can scramble any photo by simply providing or generating an image password for it. The resulting image will be obfuscated well beyond recognition. This scrambled image can then be posted on social media, printed out, put on a billboard, whatever you like. At this point, you may choose to share the image password with people who you wish to be able to view the photo (this could also be automatic and key-based in future releases).

Similar to how QR-codes are detected with phone cameras, you can scan and detect a scrambled image within the Plain Sight application. Upon detection of a scrambled image, it will prompt the user for the image password. Assuming that the image password they provide is correct, the scrambled image will be pulled out of the photo and unscrambled, yielding the original image to be viewed right on your phone screen with (potentially) very little data loss.

Below is an example flow of how Plain Sight works. Note that this is a proof of concept, and that the quality of recovered photos will improve significantly with more advanced techniques. The functions referenced in the flow diagram (bolded in green) are explained in depth in the next section.
Plain Sight's Use Flow

1.2 Technical Design
Plain Sight’s functionality is broken up into three overarching methods, each with their own set of sub-functions.
  • Scramble - given an input image and a key to use as the initialization vector, apply transformations and output the obfuscated image.
  • Scan - given a photograph containing a scrambled image (say, a photo of your computer screen where you have a scrambled image pulled up), locate the scrambled image in the picture, extract its bounds, flatten it, and return it as output.
  • Unscramble - given a scrambled image, as well as a key to use as the initialization vector, unscramble the image and output the original image.
1.2.1 Scramble The scramble algorithm take two arguments: a 3-dimensional pixel matrix which represents the input image (x, y, RGB color), and an image password which acts as an initialization vector. Below is pseudocode for the algorithm. Explanation: In essence, the algorithm repeatedly performs “roll” operations on rows and columns of pixels for many iterations. In this context, a “roll” operation is a wrap-around aka circular shift, meaning that pixels that are shifted out of the image bounds are reintroduced at the opposite side of the image. There are several important pseudo-random variables that are set in each iteration of the loop. None of these values are actually random, as they come from a random seed (the hash of the image password is used as the random seed). These random variables include:
  • Degree to which the next row/column will be rolled (shift_amount)
  • Position of next row/column to be rolled (pos_of_section_to_shift)
  • Whether the next roll will be on a column or row - determined by the parity (odd or even) of the current ASCII integer value being looked at in a SHA256 hash of our image password (ascii_vals[i])
While not a random variable, it is also important to understand the variable section_width and how it is calculated with each iteration through the loop. It represents the width in number of pixels of the next row/column to be rolled. The variable section_width is calculated as the greatest power of 2 that is less than the position of the next row/column to be shifted (pos_of_section_to_shift). In other words, the section width of a row/column will always be a power of 2 that is smaller than the pixel dimensions of the image. A step-by-step visualization of the scramble algorithm in action is shown below (t = number of transformations applied):
As you can see from this example, the obfuscation caused by rolling rows and columns increases dramatically over many iterations as the transformations continue to compound on previous obfuscations. You may also notice that the colors are being scrambled. This is a side-effect of using a 3-dimensional matrix to represent the input image (where the third axis of the matrix represents RGB color). When a row or column gets rolled, all axes are affected. While you can restrict rolling to only specific axes, rolling all axes results in increased obfuscation, which is beneficial in the case of image confidentiality.

1.2.2 Unscramble Much the same as the scramble algorithm, the unscramble algorithm takes two arguments; an image pixel matrix, and an image password. The unscramble function utilizes the Fisher-Yates Shuffle algorithm to determine the original shuffle transformations that occurred. It then applies the inverse of these operations in reverse order, resulting in our original, unscrambled image. In our context, the Fisher-Yates Shuffling algorithm works as follows:
  • Call the scramble function on the already scrambled image, but instead of applying these transformations and scrambling it again, simply keep track of the transformations that it would apply. Return this list of transformations as an array of tuples, each containing the pseudo-random variables discussed earlier (shift_amount, pos_of_section_to_shift, ascii_vals[i]).
  • Apply all the transformations in the list of moves in reverse order with the inverse shift amount for each roll operation (ie shift_amount = 33 -> shift_amount = -33)
In order for this method to succeed, it must be passed the correct image password. Since the 64-bit hash of the image password is the seed for all pseudo-random operations that occur, if the image password is incorrect (regardless of how close it is to the correct password), attempting to unscramble the image will only result in further obfuscation.

1.2.3 Scan The scan function is perhaps the most exciting and technically challenging feature in Plain Sight. As an overview, the scan function encompasses the following tasks:
  • Locating the bounds (corners and shape) of a scrambled image within the context of a photograph of a flat surface with a scrambled image on it.
  • Extracting the scrambled image from the photograph, then mapping it such that the scrambled image appears to be flat against the screen (unskew the camera perspective).
Bounds Detection: For the purpose of the proof-of-concept, it is assumed that there is in fact a scrambled image somewhere in the context of the photograph/frame being scanned. With this assumption in place, the next step is to locate the four corners of the scrambled image in the context of the larger photo. To accomplish this, a feature-detection algorithm is used that finds pixels of extreme contrast relative to its neighbors (within a set threshold).

The four most extreme points for each respective corner are determined and passed to the second part of the scan function. Perspective Flattening: With knowledge of the pixel coordinates of the four corners of the scrambled image in the context of the photograph, we can use something called perspective warping to map all pixels within the found rectangle to a new image matrix. For this project, this is accomplished using the builtin perspective warp functions available in the Python 3 library cv2.

1.3 Use Cases
Plain Sight has several current use cases:
  • Multi-factor Authentication: Plain Sight could be a compelling alternative to services such as Authy. Instead of sending a code to your phone to authenticate an online login, Plain Sight could be integrated such that an unauthenticated user sees a scrambled photo on their login screen. To authenticate, the user must pull out their mobile device, unscramble the photo, and enter the captcha-style words that the unscrambled image would depict. A huge benefit of this use case is that there can be a relatively large margin for error in the scanning process, as the quality of the unscrambled photo is largely unimportant. Additionally, unlike simple 6-digit code-based authentication applications, Plain Sight would be inherently robot-resistant given the captcha-esque content of the unscrambled images that needs to be determined.
  • Social Media: When it comes to sharing photos online, there is a clear need to be able to limit your viewing audience in a manner that is independent of what platform you are on. While private accounts and “who can see this photo” features exist on different platforms, these solutions are either inflexible or a pain to manage on a photo-to-photo basis. If data loss from scanning can be minimized, Plain Sight could be a very useful tool for protecting the viewership of your photos. For instance, there would be no need for people to create anonymous second accounts to share photos separately from their main accounts - they could instead upload everything to the same account and simply give the key for certain photo(s) to friends who they wish to be able to view it. This would eventually become automated.
  • Privileged Public Signage: There are instances where it would be useful for groups such as law enforcement to be able to post signs and messages in public areas that contain information meant exclusively for other law enforcement personnel. This same case applies to many other groups and organizations such as company employees, school faculty, and governments.
1.4 Challenges and Next Steps
While Plain Sight is a promising and compelling endeavour, it is still an early-stage project in every sense. With this in mind, there are several key aspects of Plain Sight that require attention going forward.
  • Screenshotting and Key Sharing: While not a technical challenge, it is important to recognize that Plain Sight operates on the assumption that the author of a photo trusts the people who he gives a key to not to compromise their content.
  • Output Quality: As alluded to earlier in this paper, the quality of the scanned-to-unscrambled images leave something to be desired, especially for use cases where the payload is an actual photograph. While some amount of data loss is inevitable, I know that the margin of error for reassembly can be reduced a great deal through more advanced techniques that I have yet to attempt.
  • Dimension Inference: Inferring image dimension is an important feature that will be explored going further. For the sake of the POC, the program assumes that all scrambled photos are squares when it scans them in, something that will be addressed in future versions.
  • Language Switch: While Python is a fantastic language for prototyping, I plan to reimplement the core shuffling/unshuffling algorithm in Go language for both safety and performance.
Plain Sight has been a rewarding project to work on thus far and I am very excited to see where it goes, especially with fields such as augmented reality picking up. Thank you to all that took the time to read this paper. If you’re interested in working on this project, feel free to shoot me an email (especially if you are interested in computer vision!).

1.5 Project Source & References
GitHub Repository: HERE
Python Libraries Used: cv2, numpy, hashlib
Fisher-Yates Shuffle: HERE
Feature Detection: HERE