envoy
envoy copied to clipboard
json: new exception free sanitize method
Commit Message: json: new exception free sanitize method Additional Description:
part of #35545. Because the Json sanitizing is necessary at workers threads, even we can compile the new Formatter implementation out from E-M, we still need a exception free version of sanitize() method.
This PR provide one by suppressing exception of nlohmann/json which only bring only ~10 lines new code. It may not best solution but is the solution with highest RIO.
Risk Level: low. Testing: fuzz, benchmark. Docs Changes: n/a. Release Notes: n/a. Platform Specific Features: n/a.
cc @htuch
Interesting. I'm rusty on nlohmann -- what's the functional difference? You are goign to add unit tests for the new way of calling the sanitizer, right?
The behavior of nlohmann self is completely same exception we will suppress the exceptions with a parameter in new sanitizeExceptionFree().
But the final behavior of sanitize() and sanitizeExceptionFree is a little different. The sanitize() will serialize all utf8 to format \xxx if the input contain invalid uft8. The sanitizeExceptionFree() will only replace invalid utf8 to U+FFFD and the correct utf8 will kept as expected.
I think we have discussed this in the #35598.
We should call out in tests any differences between the exception and exception-free options.
Of course.
So if I have two different same-length strings that are completey invalid utf-8 then this impl will alias them together?
So if I have two different same-length strings that are completey invalid utf-8 then this impl will alias them together?
Yeah. You cannot expect anything from invalid utf-8.
Yes I can :)
I can expect that the serializer generates a byte sequence that's legal json, that deserializes into the original bytes. That's what the current one does.
However, I think in this case it's OK to document that the exception free one has a known limitation and will just drop invalid sequnces and replace them with something common.
Yes I can :)
I can expect that the serializer generates a byte sequence that's legal json, that deserializes into the original bytes. That's what the current one does.
However, I think in this case it's OK to document that the exception free one has a known limitation and will just drop invalid sequnces and replace them with something common.
Maybe we can expect the nlohmann provide more error handler in future.
I don't expect anything, but this is a regression to current behavior. I said it's OK we just have to decide we will lose info in that case, and document it at the API and in the doc for json.
/wait
/retest
I just found the json_loader use similar way to suppress exceptions. 😆
https://github.com/envoyproxy/envoy/blob/main/source%2Fcommon%2Fjson%2Fjson_internal.cc#L571-L576
/retest
FWIW, /wait-any label is more appropriate than /wait at the cases where no explicit change is required.
I have a new idea. See source/common/protobuf/yaml_utility.cc, function utf8CoerceToStructurallyValid.
Based on the logic in there we can write a scanner to see if utf8 is structurally valid. If it's not, just escape everything to unicode. If it is, then run the nlohmann sanitizer with the fffd replacement. The tests as is should pass.
Note that function actually does its own destructive replacement and so we need to make a parallel funciton that is just a predicate.
We will of course only run this logic when needSlowSanitize() is true.
WDYT?
I have a new idea. See source/common/protobuf/yaml_utility.cc, function utf8CoerceToStructurallyValid.
Based on the logic in there we can write a scanner to see if utf8 is structurally valid. If it's not, just escape everything to unicode. If it is, then run the nlohmann sanitizer with the fffd replacement. The tests as is should pass.
Note that function actually does its own destructive replacement and so we need to make a parallel funciton that is just a predicate.
We will of course only run this logic when needSlowSanitize() is true.
WDYT?
Let me check it. I am okay to do that if we have exist logic 🙂
I see, the IsStructurallyValid of protobuf could be used.
Yes I think so. I didn't see it has part of the protobuf library, but I do see it in ./external/com_google_protobuf/third_party/utf8_range/_virtual_includes/utf8_validity/utf8_validity.h. In source/common/protobuf/yaml_utility.cc it just includes "utf8_validity.h" so I think you can copy the dependency from the bazel build rule yaml_utility.cc. I don't see this as part of protobufs, but maybe protobufs also depends on it.
Use the https://github.com/envoyproxy/envoy/pull/35936