transcrypt icon indicating copy to clipboard operation
transcrypt copied to clipboard

Consider improving transcrypt's handling of large files

Open jmurty opened this issue 4 years ago • 4 comments

As @perost-l14 mentioned in these comments on #78 transcrypt currently does some things that hinder its use for encrypting many and/or large files.

This ticket is to draw out suggested improvements so they don't get lost in the broader discussion in #78.

In particular, as paraphrased by me (@jmurty):

  • encrypted files are Base64 encoded with the -a flag to openssl enc which takes up more space than binary data. This textual encoding may not be necessary since encrypted content isn't human-readable or diff-able as text.

  • adding -delta to encrypted file lines in .gitattributes disables git packing (delta compression) to avoid wasteful work when pushing encrypted files that aren't compressible. Example:

    *.jpg filter=crypt diff=crypt -delta
    

Would it make sense to update transcrypt to use binary data instead of base64, and set or recommend -delta in .gitattributes by default?

What would be the implications of doing these things, for both new transcrypt'ed repos and existing ones?

jmurty avatar May 21 '20 14:05 jmurty

What is the status of this improvement? I think it's a necessary feature even for small files. Using base64 is not necessarily or required for like git repos.

ZhymabekRoman avatar Jan 20 '23 07:01 ZhymabekRoman

I'll try to improve that. And I'll also try to optimise transcrypt. Because I have a large git repository over 500 mb and decryption is so slow.

ZhymabekRoman avatar Feb 14 '23 15:02 ZhymabekRoman

Work on improving the efficiency of transcrypt would be welcome, though be warned that using it to encrypt large amounts of data or files isn't really the expected use-case – it's intended for a few small secret files that are part of a larger repo.

That said, there might be some easy wins that would improve things without requiring a major rewrite or breaking changes.

I'd encourage you to start by looking at the building block git_clean (encrypt) and git_smudge (decrypt) functions in the script. You can run these separately to simulate the steps taken behind the scenes by Git, and testing the performance and correctness of these atomic pieces is likely to be much easier than working with a real repository.

Examples of this based on the current main branch, run within this project's repository:

# Manual and minimal transcrypt config in repository
git config --local transcrypt.cipher aes-256-cbc
git config --local transcrypt.password 'correct horse battery staple'
git config --local transcrypt.openssl-path openssl

# Decrypt the encrypted sensitive_file
cat sensitive_file | ./transcrypt smudge context=default sensitive_file

# Encrypt the decrypted sensitive_file
cat sensitive_file | ./transcrypt clean context=default sensitive_file 2>/dev/null

jmurty avatar Feb 16 '23 12:02 jmurty