Home About Me Projects ⇢ file-frag-proto

PuzzleCrypt: Secure, Fragmentation-Based File Security

A demonstration of PuzzleCrypt's GUI application in action. The test file is a short film about shoes.
Introduction
Before getting into this project, let me pose a question to you: What is your worst nightmare? Whether it involves spiders, clowns, or Steve Harvey's mustache, an all too real list-topper in society today is having your most private files leaked without your consent. With the ever-increasing prevalence of smartphones and social media, sharing sensitive files securely simply cannot rely on inherent trust between parties anymore. As you may have guessed, this is where passwords and encryption algorithms traditionally come into play. However, with computer hardware getting faster, cheaper, and more efficient, conventional encryption algorithms are becoming increasingly susceptible to compromisation. Even if a file is encrypted with a very strong cipher and protected with a very strong password, the possibility still exists for unauthorized access; be it a well-informed attacker, a brute-force attack from a quantum supercomputer 20 years down the road, or a nosy individual with an unprecedentedly lucky series of guesses.

Given this inconvenient fact, the only way to truly secure data is to prevent it from even being decryptable in its resting state. It is for this reason that file fragmentation is such a promising and powerful security solution. By breaking a file into fragments and storing these fragments at different physical locations (ie different computers over a network), obtaining the original file becomes a physically impossible task without having access to all fragments. While file fragmentation as an alternative to key-based encryption has been documented and explored in the past, I am interested in the application of file fragmentation for the purpose of multi-party ownership of sensitive files. This is an area that has been largely untapped, and something that I believe has incredible potential. After countless hours of procrastination from my physics class spent brainstorming and planning, I present to you PuzzleCrypt; the first demonstration of my fragmentation-based encryption algorithm.




PuzzleCrypt's Overarching Flow
Part 1: Fragmentation

    1.1) File Division: The first step of this algorithm involves the actual partitioning of the target file. In order for the file to be processed, the fragmentation function takes three main parameters:
    • 1. Path to the target file
    • 2. Number of desired fragments n
    • 3. Output location for file fragments
    Given that the intended objective of this program is to allow for multi-party ownership of a sensitive file, all fragments must be as similar as possible. For this reason, it is important that the data is equally distributed across all fragments of the file. This is accomplished by simply splitting the bytes of the target file into n sequential, equally sized pieces. If the operation original_file_size % n yields any remainder, this is tacked on to the last fragment that is produced. It should also be noted that each fragment is issued a sequence identifier (sequenceID) to denote the fragment's place in the file (ie the first fragment has sequenceID=0, second has sequenceID=1, sequenceID=2,...sequenceID=n-1).


    1.2) Payload Encryption: Once the target file has been split into n separate and sequential parts, it is necessary to encrypt all these pieces as to prevent even a single fragment from being compromised. For this reason, an AES256 symmetric encryption algorithm is employed across all files using a shared AES key. While traditionally AES keys are randomly generated 128-bit (16 byte) or 256-bit (32 byte) strings that are stored locally as key files, PuzzleCrypt cannot rely on locally stored keys for security sake. For this reason, The user is prompted for a 10+ character password to use as a proxy to generate an 128-bit encryption key. The function that takes the user-entered password as a parameter to create an AES key on the fly does the following:
    • 1. Passes the user-entered password through a SHA-256 hash function.
    • 2. Takes the last 16 bytes of the (statically sized) password hash and uses that as the AES key.
    By obtaining the AES key from a password prompt, there is no need to store a key file locally, thus allowing the key to only exist in memory, leaving authentication up to the exclusive knowledge of the file owners.



    1.3) HMAC: Segmenting and encrypting the target file's data is all fine and dandy, but we still need a way to denote the sequential order of these fragments for when we eventually want to decrypt and reassemble them. This is where the HMAC comes into play. An HMAC, or "Hash-based Message Authentication Code" is a form of cryptographic authentication that is comprised of a hash of two or more secret values; a hash that can only be produced by a program with the knowledge of these secret values. The presence of an HMAC proves that the fragment in question was produced by a trusted source. In our context, the HMAC looks like this:
    Put simply, the HMAC consists of a hashed representation of the encryption key concatenated with the fragment's sequence number. Or (to beat a dead horse) as a function, HMAC = HashFunction(secret_AES_key + sequenceID). By producing an HMAC for each fragment, we are able to provide authentication and maintain fragment order simultaneously, knocking out two birds with one stone. This will be important for reassembly.
    1.4) Format and Output: Now that an HMAC has been produced for each encrypted fragment, we need to format and write these new files to disk. The structure of fragment files are as follows:
    This is accomplished by simply appending each fragment's 32-byte HMAC to its encrypted payload. Each of these securely packaged fragments is saved under a random 8-character filename with a .frg file extension. (ex: SY8M2IKY.frg). These names are arbitrary and can be changed, but the .frg file extension is necessary for reassembly.
      Part 1 (Fragmentation) Recap

Part 2: Reassembly

    2.1) Locating Fragments: In order to reassemble a fragmented file, the first step is simply providing the program with all the fragments. In order for this to be accomplished, all fragments must exist/be placed in a single directory location. Using a regular expression, only files with a .frg file extension are considered for authentication. These potential fragments are temporarily stored and passed along to the authentication stage.

    2.2) Authentication: Now that we have obtained a list of all .frg files at the specified location as potential fragments, we must ensure that they are in fact the fragments we are looking for. This is where the HMACs come back into play. When a group of fragments are to be reassembled, the user is prompted for the file password (the password we specified during fragmentation). Upon entry of the file password, the HMAC for the first fragment is generated (HashFunction(key_obtained_from_file_password + '0') as sequenceID=0 for the first fragment). This generated HMAC is then compared against the HMAC of every potential fragment until a match is found. Once a match is found, the sequenceID is incremented, and the respective HMAC being looked for is generated and compared against the remaining fragments until a match is found, or until all fragments have been accounted for. In the case that no match is found, the program exits immediately and returns an authentication error.

    2.3) Decryption and Assembly: At this point, the fragments have already been authenticated and now just need to be decrypted and appended to one another in the correct order. By sequentially authenticating fragments, they are naturally stored in the correct order. For each authenticated fragment, we dispose of the used HMAC, leaving us with just the encrypted payload. The payload is decrypted using the AES key generated from the same user-entered file password. The unencrypted bytes of each fragment are sequentially appended to one another until the original file's data has been entirely restored.

    2.4) File Output: Now that the data of the original target file has been successfully reassembled, we simply need a location to save the file to, and a filename to save it under. This manually entered filename should include a file extension if applicable, as the bytes that compromise the file data need a way to be interpreted, so try to remember what kinds of files your keeping track of!

    Part 2 (Reassembly) Recap



Design Challenges + Looking Forward

As is always the case with software engineering, there were very few design decisions to be made that had a single, correct answer. However, given PuzzleCrypt's infancy, I plan to revisit several of these decisions for the second iteration. I'd like to address a couple of the more challenging decisions that I made for this prototype, as well as changes that I plan to implement for the next iteration.

Perhaps the most glaring design choice that I made was the choice to not persist the target file's filename through fragmentation. My thinking on this was that the relationship of a filename to its file data is that of a key:value pair. Thus, for maximum efficiency and minimum complexity, only dealing with the bytes that comprise the file data allowed for a faster algorithm and a less complex fragment structure. The question of where to store the filename without having either unnecessary redundancy, or lack of similarity amongst fragments was another question to consider. Alas, I do plan to find an efficient way to include the filename in the next iteration. Another improvement that I plan on implementing is byte-scrambling before the target file is actually partitioned into fragments. By scrambling the bytes before fragments are created, it ensures that no useful information can be recovered from an individual fragment, even if it is successfully decrypted. While Python has allowed me to rapidly develop my first prototype, I plan to implement the next version in Java for concerns of portability.


PuzzleCrypt's Linux GUI
Use Cases
While there are many, the two most compelling use cases for multi-party file ownership via fragmentation that I have found are the following:
  • Digital Contracts (Improved Integrity): Let's say that you win the lottery. Given this public information, you become a target for fraudulent lawsuits. Someone slips in front of your house, breaks their back, and threatens to sue you for injury liability. After much back and forth, you draw up an electronic contract to pay them a sum of $900 to leave you alone. The contract is signed by both parties. However, after a week, you revisit the contract to see that the sum has been changed to $9000. How do you prove the sum used to be $900 besides word of mouth? The integrity of digital contracts is inherently challenging. However, if the contract was produced, signed by both parties, and then fragmented, changing the contract after signature would be next to impossible.
  • Sex Tapes (Improved Confidentiality): If you look beyond governments, corporations, and paranoid computer security enthusiasts, there are very few partisan applications that require the level of security provided by multi-party file fragmentation. This use case falls into that small handful of applications. While it is both very taboo, and far from my original intention, there is no denying that file fragmentation makes a very compelling case in preventing revenge porn (an ex partner leaking explicit content of the other partner for the sake of humiliation). Bob and Alice have been dating for six months. It's Valentine's day, and they are feeling especially adventurous. They decide to make a sex tape. Using a special application with video recording and file-fragmentation built in, Alice records the video on her phone. Once the recording has finished, the couple decides on a file password, and the video is fragmented into two fragments, one of which is sent to Bob's phone, which has the same application installed. Bob wants to watch it one day, so Bob asks Alice for the other fragment. She sends it to him, he enters the file password, and the video is reassembled on his phone within the application. Still in the application, Bob watches the video a couple times. As a security feature of the application, the video will only be playable if internet connection is maintained, and the phone is not plugged into a computer. When either A) Bob is done viewing the video, or B) Alice wants to end the viewing session, this message is sent to the viewer's device and the video file is once again fragmented and distributed. After dating for 11 months, Alice decides that Bob is not treating her well. She tells Bob the truth about how she feels and ends things as nicely as she can. Bob is very angry and hurt, and decides to get her back by leaking videos of her. After realizing that he doesn't actually have any explicit videos of Alice on his phone, he realizes how irrational he is being and has time to calm down and move on with his life.

Conclusion
Designing, documenting, and implementing PuzzleCrypt has been a very enjoyable and rewarding experience for me thus far. However, what sets this project apart for me is its real-world potential. I am still very much in the process of brainstorming use cases, and would like to encourage you as a reader to do the same if this topic interests you! This is an open-source project after all, and I hope to collaborate on it with anyone who is interested, regardless of technical ability. You can find the GitHub repository linked below. Thanks for tuning in and look forward to more software projects in the near future.