Fix: Mitigate DoS vulnerability by handling deeply nested JSONs
This commit introduces a fix for a potential denial-of-service (DoS) vulnerability that could be triggered by providing a deeply nested JSON payload. The serde_json::from_str function, which was used for parsing JSON, is susceptible to stack overflows when dealing with such payloads.
To mitigate this, the serde_stacker crate has been introduced. This crate provides a deserializer that uses a dynamically growing stack, thus preventing stack overflows when parsing deeply nested JSON structures.
A helper function, from_str_safe, has been created to encapsulate the usage of serde_stacker. All instances of serde_json::from_str have been replaced with this new safe function.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.
Hello @Mirza-Samad-Ahmed-Baig, thank you for contacting us!
I've studied the problem described, but I think that we can't merge this PR.
The described problem is a stack overflow that happens when parsing a JSON value: if the depth of the object is more than 128, the value defined in the serde parser, the parsing fails because of the depth limit being exceeded. This failure doesn't cause any kind of panic in the engine; instead, such objects are treated as incorrect, the same way one would treat malformed JSON. The error message provided by serde (and the one we surface to the end-user) reflects the issue clearly. Therefore, one can't say that there's a DoS vulnerability: Pathway rejects an object after having spent only a limited amount of resources.
The proposed solution, however, avoids stack overflow by dynamically growing the stack. While it allows parsing of larger objects, on the contrary, it makes Pathway susceptible to a DoS attack: suppose that someone sent a JSON with one million levels of nesting; then we wouldn't issue an error after processing the first 128 levels, but would instead spend much more time and resources to process all one million.
So I would conclude that the current version provides a basic defense for this type of scenario, while the proposed fix unlocks the parsing of deeply nested JSONs but may lead to unintended effects during parsing. It would be better to keep JSON parsing as it is, while perhaps providing an option for the user to set a custom recursion depth.
Please feel free to contact us via Discord or any other preferred method if you believe that the standard 128-level restriction imposed by serde should still be addressed.