snowflake-connector-python icon indicating copy to clipboard operation
snowflake-connector-python copied to clipboard

SNOW-1296805 Remove option tmpdir from SnowflakeEncryptionUtil.decrypt_file

Open sfc-gh-stan opened this issue 1 year ago • 3 comments

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1296805

  2. Fill out the following pre-review checklist:

    • [x] I am adding a new automated test(s) to verify correctness of my new code
    • [ ] I am adding new logging messages
    • [ ] I am adding a new telemetry message
    • [ ] I am modifying authorization mechanisms
    • [ ] I am adding new credentials
    • [ ] I am modifying OCSP code
    • [ ] I am adding a new dependency
  3. Please describe how your code solves the related issue.

https://github.com/snowflakedb/snowflake-connector-python/blob/972accde29792fb06fbfd068b870022bc7682f82/src/snowflake/connector/storage_client.py#L382 is problematic where self.tmp_dir are on a different device than target directory (self.full_dst_file_name) in a network filesystem (e.g. persistent storage mounted to a docker container). Cross device file operations could cause the program to hang (especially on windows machines). We can eliminate this by making the decrypted intermediate file is written to the same directory as target directory (self.full_dst_file_name), where we know is writable.

sfc-gh-stan avatar Jun 29 '24 00:06 sfc-gh-stan

Ohh and unfortunately this is a breaking change that needs to be announced ahead of time!

sfc-gh-mkeller avatar Jul 02 '24 17:07 sfc-gh-mkeller

Great job. Same thing should be done to encrypt_file and forever put temp directory in the rear-view mirror. Also, be aware that we started down on the .part naming, maybe we should follow that.

encrypt_file should be immune from the cross-device file operation issue because it just writes the encrypted file to a temp dir and later deletes that file. It's also more likely that the we don't have write access to the incoming directory. I prefer keeping tmpdir in encrypt_file for that reason.

For .part naming, what decrypt_file returns should look like <filename>.part<random_suffix>. Immediately after decrypt_file returns, this file gets renamed to just the final filename and the part file is unlinked: https://github.com/snowflakedb/snowflake-connector-python/blob/706bc2f6ae58da25885064497c217cfedcc19020/src/snowflake/connector/storage_client.py#L389. I don't think we have to worry about the filenames here too much?

sfc-gh-stan avatar Aug 02 '24 18:08 sfc-gh-stan

For .part naming, what decrypt_file returns should look like .part<random_suffix>

This was done by design, I wanted to overwrite half-done download with new attempts, so a ctrl+c left file would be cleaned up on a second try

sfc-gh-mkeller avatar Aug 02 '24 22:08 sfc-gh-mkeller