pyce Encryption key lookup depends on location from where code is executed

Hi,

thank you for releasing your work as open source!

I noticed that the dictionary key used to lookup the encryption key for a source file changes depending on which filesystem path I execute the code from.

When I execute the demo code everything works fine:

$PYTHON -c "from pyce import PYCEPathFinder; \
            import sys; \
            PYCEPathFinder.KEYS=dict(${KEYS}); \
            sys.meta_path.insert(0, PYCEPathFinder); \
            from pyce import hello; \
            hello.hello()"
Hello World!

However, when I navigate one directory down and execute the same code I get a KeyError:

cd .. # change execution directory
$PYTHON -c "from pyce import PYCEPathFinder; \
            import sys; \
            PYCEPathFinder.KEYS=dict(${KEYS}); \
            sys.meta_path.insert(0, PYCEPathFinder); \
            from pyce import hello; \
            hello.hello()"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 724, in exec_module
  File "/pyce/pyce/_imports.py", line 73, in get_code
    data = decrypt(data, PYCEPathFinder.KEYS[normcase(relpath(path))])
KeyError: 'pyce/pyce/hello.pyce'

As the relative path changes, it can no longer be looked up in the keys dictionary. You should be able to reproduce by inserting the cd .. line into demo.sh as seen above.

How would you handle this? A naive solution would be to use absolute instead of relative paths but then the code would have to be deployed in the exact same location as where it was built.

So another idea I had was to use SHA-256 hashes of the contents of each encrypted source file as the lookup key instead of the filesystem path. This should make the key lookup location-independent, but would result in a slight increase in startup time as every file would need to be hashed once upon key lookup.

Instead of using filesystem paths in the key list as before...

[('pyce/hello.pyce', '43908f4464e86bfabaacbd1a6b5f0948f43e69ee1c050b2e131087733cd98707')]

... the keys would look something like this when using hashing:

[('ed968e840d10d2d313a870bc131a4e2c311d7ad09bdf32b3418147221f51a6e2', '43908f4464e86bfabaacbd1a6b5f0948f43e69ee1c050b2e131087733cd98707')]

... where ed968e840d10d2d313a870bc131a4e2c311d7ad09bdf32b3418147221f51a6e2 would be the SHA-256 hexdigest() of the contents of pyce/hello.pyce.

Let me know what you think, I am happy to try and make a contribution.

Sep 22 '18 08:09 mkai

@mkai your content-based approach on looking up the key would be an excellent contribution. I can't imagine a scenario (right now) where we wouldn't want that.

It has the added benefit of deduplicating the key dictionary / serialized data structure you may need to send over the wire to a production site for execution.

It is also directly related to our main approach of using convergent encryption in the first place.

I'd welcome this change to the codebase. I am also hoping to get some unit tests in place and CI off the ground so we can regression test inbound PRs.

Sep 22 '18 13:09 theonewolf

Glad to hear! Working on it…

Sep 23 '18 13:09 mkai

There might be one issue with content based approach.

How would you distinguish between multiple init.py files when most of them are empty (i.e., having same content)?

Mar 05 '19 11:03 dileep1996

@dileep1996 I haven't deeply tested this or thought through all the corner cases yet. This summer I might have time to refresh the library though. I am slowly circling back on these issues and PRs.

Jun 27 '19 12:06 theonewolf

pyce pyce copied to clipboard

Encryption key lookup depends on location from where code is executed

pyce
pyce copied to clipboard