snowflake-connector-python
snowflake-connector-python copied to clipboard
SNOW-1296805 Remove option tmpdir from SnowflakeEncryptionUtil.decrypt_file
Please answer these questions before submitting your pull requests. Thanks!
-
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1296805
-
Fill out the following pre-review checklist:
- [x] I am adding a new automated test(s) to verify correctness of my new code
- [ ] I am adding new logging messages
- [ ] I am adding a new telemetry message
- [ ] I am modifying authorization mechanisms
- [ ] I am adding new credentials
- [ ] I am modifying OCSP code
- [ ] I am adding a new dependency
-
Please describe how your code solves the related issue.
https://github.com/snowflakedb/snowflake-connector-python/blob/972accde29792fb06fbfd068b870022bc7682f82/src/snowflake/connector/storage_client.py#L382 is problematic where self.tmp_dir are on a different device than target directory (self.full_dst_file_name) in a network filesystem (e.g. persistent storage mounted to a docker container). Cross device file operations could cause the program to hang (especially on windows machines). We can eliminate this by making the decrypted intermediate file is written to the same directory as target directory (self.full_dst_file_name), where we know is writable.
Ohh and unfortunately this is a breaking change that needs to be announced ahead of time!
Great job. Same thing should be done to
encrypt_fileand forever put temp directory in the rear-view mirror. Also, be aware that we started down on the.partnaming, maybe we should follow that.
encrypt_file should be immune from the cross-device file operation issue because it just writes the encrypted file to a temp dir and later deletes that file. It's also more likely that the we don't have write access to the incoming directory. I prefer keeping tmpdir in encrypt_file for that reason.
For .part naming, what decrypt_file returns should look like <filename>.part<random_suffix>. Immediately after decrypt_file returns, this file gets renamed to just the final filename and the part file is unlinked: https://github.com/snowflakedb/snowflake-connector-python/blob/706bc2f6ae58da25885064497c217cfedcc19020/src/snowflake/connector/storage_client.py#L389. I don't think we have to worry about the filenames here too much?
For .part naming, what decrypt_file returns should look like
.part<random_suffix>
This was done by design, I wanted to overwrite half-done download with new attempts, so a ctrl+c left file would be cleaned up on a second try