flask-caching
                                
                                 flask-caching copied to clipboard
                                
                                    flask-caching copied to clipboard
                            
                            
                            
                        Extensible serializers support
Pickle serializer is not secure and may lead to remote code execution or local privilege escalation. If attacker gains access to cache storage (filesystem, memcached, redis) then he can craft special payload, poison the cache and execute python-code. To avoid the attack we can use a safer serializer (eg. JSON). I’ve added possibility to use any serializer. All changes are backward compatible and default serializer is pickle.
References: https://cwe.mitre.org/data/definitions/94.html https://cwe.mitre.org/data/definitions/502.html https://davidhamann.de/2020/04/05/exploiting-python-pickle/ (attack example)
Coverage increased (+0.7%) to 79.805% when pulling 8d7b253821991e43f3b2dcb1d6b09c8e855ba476 on subnix:feature/extensible-serializer into 82fa1d5f5d5a461758ded99a678439256c9b26c2 on sh4nks:master.
Coverage increased (+0.7%) to 79.805% when pulling 8d7b253821991e43f3b2dcb1d6b09c8e855ba476 on subnix:feature/extensible-serializer into 82fa1d5f5d5a461758ded99a678439256c9b26c2 on sh4nks:master.
Would it make sense to encode bytes in base64 (or something else) with JSON? Having bytes in an object is a relatively common case and JSON can't serialize bytes. If you look at the example below, cls=BytesEncoder and object_hook=as_bytes could be passed to json.dumps and json.loads respectively as defaults.
import json
from base64 import b64decode, b64encode
BYTES_HINT = "__bytes__"
class BytesEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (bytes, bytearray)):
            return {BYTES_HINT: True, "bytes": b64encode(obj).decode("ascii")}
        return json.JSONEncoder.default(self, obj)
def as_bytes(obj):
    if BYTES_HINT in obj:
        return b64decode(obj["bytes"])
    else:
        return obj
test_data = dict()
test_data["float"] = 1.5
test_data["bytes"] = b"bytes"
test_data["nested_dict"] = {"1": 1, "2": b"dict_bytes"}
test_data["list"] = [True, "string", b"list_bytes", None]
print(test_data)
jsonstring = json.dumps(test_data, cls=BytesEncoder)
assert test_data == json.loads(jsonstring, object_hook=as_bytes)
- It’s a “fake responsibility” that library is responsive for serializing types unsupported by json. We can’t cover all the cases and shouldn’t try.
- Explicit is better than implicit. This may lead to unexpected behavior by developer and requires strong input validation. For example, application can cache user-provided (or third party) data that may contain special keys (“bytes”: True).
All in all, it’s developer’s responsibility, not library’s one.
I see 2 options for supporting bytes:
- Low level. Add some flag (e.g. raw=True) to get/set methods, that allows to bypass serialization.
- High level. Add decorator (e.g. memoize_bytes) with auto base64 encoding/decoding.
Explicit is better than implicit. This may lead to unexpected behavior by developer and requires strong input validation. For example, application can cache user-provided (or third party) data that may contain special keys (“bytes”: True).
Just to point out that bytes isn't really the special key in the example. I tried to show that with test_data["bytes"] = b"bytes"
The special key was__bytes__and if needed you could use something more specific like __flask_caching_bytes_hint__.
Here's an example that's even more clear showing that having bytes as a key doesn't break things. That key could be almost anything and it shouldn't matter:
test_data = {'bytes': True}
jsonstring = json.dumps(test_data, cls=BytesEncoder)
assert test_data == json.loads(jsonstring, object_hook=as_bytes)
My bad, I mean __bytes__: True.
test_data = (
    {"__bytes__": True},
    {"__bytes__": True, "bytes": "AAA"},
    {"__bytes__": True, "bytes": "!"},
    {"__bytes__": True, "bytes": "Cg=="}
)
for case in test_data:
    jsonstring = json.dumps(case, cls=BytesEncoder)
    assert test_data == json.loads(jsonstring, object_hook=as_bytes)
@sh4nks are there any updates?
I’ve obtained CVE ID (CVE-2021-33026) for this issue.
@subnix @sh4nks are there any updates on this issue?
We discovered the CVE through python-safety in our codebase.
@subnix Do you need help with this issue?
Hi, sorry for ignoring this issue for a while. Due to RL I don't have as much time as I'd like for maintenance work. I'll try to take a closer look today or tomorrow.
Pylibmc also uses pickle by default. @sh4nks maybe we should force (de-)serialize data on our own?
https://github.com/lericson/pylibmc/blob/8e783a69cc69fb04b9faad6c61ab43193569ab0f/src/_pylibmcmodule.c#L1245-L1284
Hey @sh4nks - any news about this ? We'v been made aware about he issue at Apache Airflow. It would be great to understand what we can expect here. We might help with some testing etc. if needed.
The CVE is rather disruptive and should probably be marked as INVALID (but good luck doing so, when someone requested a bogus CVE for flask a while ago, it took months until it was fully taken down) as there is no security vulnerability here...
Just like your database, the cache should be properly secured and behind authentication (or with proper file system ACLs if file-based). So the only way this is insecure, is when you:
- expose your cache to untrusted users (e.g. unauthenticated redis exposed to the internet);
- have attackers that can write arbitrary files to your disk (and thus poison a file-based cache);
- have a security issue that lets users write arbitrary data to your cache storage backend without going through this library
So when you have a properly secured system and application, it's is a non-issue. Otherwise tools like Celery would need CVEs as well because guess what, they can use Pickle as well.
I'm not even sure how this CVE should be handled by people who are very strickt about CVEs ("nothing with open CVE" rules or similar). Because any update containing this change will not be more secure out of the box. The default is still pickle, and changing that would require a major version bump and probably break people's applications anyway (some applications are fine with JSON, but I'm sure quite a few do make use of Pickle's ability to serialize almost any Python object). So would it actually be OK to mark the next release as a "fixed version"? I'm not sure.
Also, a CVSS of 9.8 puts this on par with wormable vulnerabilities in widely used network-exposed applications or operating systems. This is simply not true. It doesn't "require no privileges" - you need write access to an internal part of an application. That's probably "requires high privileges".
Just like your database, the cache should be properly secured and behind authentication So when you have a properly secured system and application, it's is a non-issue.
Security is not about "should be". It's about people who make mistakes. What do you think about LPE in Linux? It's not a vulnerability because nobody can gain access to your server?)
Otherwise tools like Celery would need CVEs as well because guess what, they can use Pickle as well.
Please, check the documentation:
- The default serializer is JSON;
- Warning about pickle insecurity;
- Offers cryptography signing;
- By default Kombu will only load JSON messages. Do you see the difference between Celery and flask-caching? Celery offers native pickle for people who live in an ideal secure world, but for others it offers choice and JSON by default.
CVSS of 9.8
I have no idea why it’s so high.
I quite agree 9.8 is wrong. I calculated it and it looks like it should be 9.1: https://www.first.org/cvss/calculator/3.0#CVSS:3.0/AV:N/AC:L/PR:H/UI:N/S:C/C:H/I:H/A:H
Is there a way we can appeal that assessment?
This is a very low risk vulnerability in case of Apache Airflow - one that we can just ignore. In our case we are using "/tmp" as cache storage and if someone can modify it, they can run ANYTHING. This is the same level severity as "have ability to run any program". But wa are pretty special in Apache Airflow where we execute Python code that is managed outside of the main program (by design). So almost by definition we allow to run arbitrary python code provided by our users, so for us this is no increase whatsoever in security risk. I just closed an issue about it in Airflow https://github.com/apache/airflow/issues/16541 - mainly because we are using local storage with "/tmp" folder in Airflow and this configuration is hard-coded. If you have access there, it's a done deal.
However if you use remote cache, it's a bit different story and I agree the issue is critical. And it's rather different from unprotected DB. The problem is that this vulnerability in this case is remotely exploitable. Exploiting this vulnerability, you can MOVE between machines if you gain access to the cache server. You can modify the stored cache and inject your code that then will be executed on the server using it. Unlike the DB access, you can inject CODE which is executed somewhere else - even if the remote system is not hacked. Code that was never supposed to be there. This is not possible by default with any DB I know. This is serious. There should be a way to disable it (and it should be default)
any updates?
Hello @subnix @sh4nks are there any updates on this issue?
Hi, what is the next best alternative to work around this problem. Are there other libraries that could help with something similar? I am currently using flask-caching with redis. Using this with plotly which (afaik) requires using flask-caching to use the memoize features. Is there a way to opt-out of using pickling for serializing?
I wouldn't worry too much about this issue; as long as your redis remains secure, your pickles are also secure.
IMO people worry too much about pickle. It can be used responsibly, and in a situation where pickle's security issue can cause problems, you already have bigger problems anyway.
any update on this? @subnix
Hi any updates on this? We've got open security findings: CVE-2021-33026 CVSS V2: 7.5/CVSS:2.0/AV:N/AC:L/Au:N/C:P/I:P/A:P CVSS V3: 9.8/CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H
Let me know if there is anything I'm able to do to help.
Any update?
Hi, has anybody found any alternative to this library that doesn't use pickle or doesn't have an open CVE? (We are using flask-caching with plotly/dash, which internally uses the flask-caching object)
As stated above, we consider the CVE invalid. You'll want to figure out how to mark those as invalid in whatever scanning tool is reporting them to you. Use of pickle on its own isn't a security issue, it would only be an issue if we were loading arbitrary data, which we're not.
This repo has recently changed maintainers, and the volunteer maintainers are still trying to find time to start working on various issues. There is a PR in cachelib to support serializer customization, and we will be transitioning flask-caching to use cachelib.
Is there any way to get the CVE closed? I use flask-caching in a library that I provide to others and I'm not thrilled with people being concerned about getting weekly dependabot notifications for this issue which boils down to "if you let people upload code to your server, they can run it."
Still not able to close the CVE? My gitlab vulnerability report is also complaining.
For my projects I ended up just telling dependabot to ignore the CVE on the basis of “risk is tolerable to this project.” That doesn’t help downstream users though. :(
First of all, thanks for the great work with flask-caching. Unfortunately I can't disable warnings in the scanning tool as they are managed centrally at some institutes. Is there any workaround or fork in the meantime until cachelib is used?
Once #308 is merged, this will need some updates due to cachelib:#63