Extraneous log messages within validate function
Description
We are integrating protovalidate for Protobuf validation in a Python service and noticing thousands of strange log messages coming from within validate(request).
Steps to Reproduce
We have the following in requirements.txt:
protovalidate==0.3.1
And in our middleware.py:
from protovalidate import validate, ValidationError
...
try:
parsed_request = ParseDict(json.loads(body), request_proto_class())
validate(parsed_request)
except ValidationError as validation_exc:
logger.exception(validation_exc)
raise HTTPException(status_code=422, detail=str(validation_exc.violations))
Expected Behavior
Validation function should not log extraneous messages.
Actual Behavior
Tens of thousands of log messages such as:
message:| | | | | | | | | | | | | | | | | | | | | | | | | | Tree('unary', [Tree('member', [Tree('member_dot', [Tree('member', [Tree('primary', [Tree('ident', [Token('IDENT', 'rules')])])]), Token('IDENT', 'lte')])])])",
coming from within the validate function.
Additionally, we're finding that the validate function hangs for > 10 seconds.
Screenshots/Logs
Environment
- Operating System: Initially found the issue in one of our testing environments in Kubernetes, but the issue is reproducible on our local macOS machines.
- Protobuf Compiler & Version: Buf 1.29.0
- Protovalidate Version: 0.3.1
Possible Solution
Additional Context
Hey @carolyn-lyra, I suspect those logs are coming from protovalidate-python's usage of the cel-python library — there's a related issue on that repo about extraneous logging; can you try out the code snippet in the comment to see if it fixes the extra logging? And let us know if the hanging still occurs with that change in place.
Hey @carolyn-lyra, I suspect those logs are coming from protovalidate-python's usage of the cel-python library — there's a related issue on that repo about extraneous logging; can you try out the code snippet in the comment to see if it fixes the extra logging? And let us know if the hanging still occurs with that change in place.
Thanks, this fixed the logging issue for us! We'll deploy to a test environment to see if the performance issue still persists.
Hey @stefanvanburen - unfortunately, we're still seeing the latency increase issue even after deploying your suggested logging fix. The validate step is taking on average over 5 seconds to complete, and sometimes spiking over 10 seconds. For additional context, our service's payloads are often times quite large (a few thousand fields per payload), but this wasn't an issue when we were using protoc-gen-valdiate. Has your team seen latencies increase with this new library vs. the deprecated protoc-gen-validate library?
Hey @carolyn-lyra, I don't think we've seen similar increases so far. Just to confirm, you're re-using the same exported validate func across your middlewares, correct?
I suspect that most of the slowness can be traced down to cel-python (one related issue in https://github.com/cloud-custodian/cel-python/issues/68, and I've asked about a new release as there have been some performance improvements since the last one). However, any additional details you can provide us from your end in order to reproduce the slowness would be great!
Hi @stefanvanburen -
Yes, we are using the validate func from from protovalidate import validate in our middleware.
Thanks for linking the issues for context. I've pasted a scrubbed version of the Protobuf message below with the latency regression. Typically, the repeated field repeated DataSchema data = 1; repeats for 100-200 times per request. A few of the nested repeated fields in DataSchema can also have >30 items. One of the sample payloads we pulled for example had over 7k fields total. Let me know if you need any additional clarification on the message below!
syntax = "proto3";
import "buf/validate/validate.proto";
import "google/protobuf/wrappers.proto";
message Type1 {
string value = 1 [(buf.validate.field).string.pattern = "^[a-zA-Z0-9]{8}-(?:[a-zA-Z0-9]{4}-){3}[a-zA-Z0-9]{12}$"];
}
message Request {
message DataSchema {
double field_1 = 1 [
(buf.validate.field).ignore_empty = false,
(buf.validate.field).double = {
gte: 0,
lte: 100
}
];
google.protobuf.DoubleValue field_2 = 2;
double field_3 = 3 [
(buf.validate.field).ignore_empty = false,
(buf.validate.field).double = {
gte: -1000,
lte: 1000
}
];
Type1 field_4 = 4 [(buf.validate.field).required = true];
google.protobuf.Int32Value field_5 = 5 [(buf.validate.field).int32 = {
gte: 1,
lte: 10
}];
google.protobuf.Int32Value field_6 = 6 [(buf.validate.field).int32.gte = 0];
google.protobuf.BoolValue field_7 = 7;
google.protobuf.BoolValue field_8 = 8;
google.protobuf.Int32Value field_9 = 9 [
(buf.validate.field).ignore_empty = false,
(buf.validate.field).int32 = {
gte: 0,
lte: 730
}
];
google.protobuf.DoubleValue field_10 = 10;
google.protobuf.DoubleValue field_11 = 11;
string field_12 = 12;
Type2 field_13 = 13; // "Type2" is an enum
google.protobuf.Int32Value field_14 = 24 [(buf.validate.field).int32 = {
gte: 1,
lte: 120
}];
repeated string field_15 = 15;
google.protobuf.BoolValue field_16 = 16;
google.protobuf.BoolValue field_17 = 17;
repeated string field_18 = 18;
repeated string field_19 = 19;
google.protobuf.BoolValue field_20 = 20;
Type3 field_21 = 21; // "Type3" is an enum
google.protobuf.Int32Value field_22 = 25 [(buf.validate.field).int32 = {
gte: 1,
lte: 120
}];
google.protobuf.DoubleValue field_23 = 23;
Type4 field_24 = 26; // "Type4" is an enum
}
repeated DataSchema data = 1;
}
Hi @stefanvanburen, following up on this - any updates on the performance issue?
hey @carolyn-lyra, no current updates; I still suspect most of the perf gains we can expect come from upstream improvements and a release. It'd be nice to convert your case above into a benchmark that we can look to confirm that intuition, but I'm not sure when I'll have time to take a look at that. Open to a PR or a demo repo for further investigation, though!
Hi @carolyn-lyra — we just published v0.7.1 which should bump cel-python up to 0.2.0, which should hopefully remove all/most of the extraneous logging you're seeing. Let me know if you have a chance to take a look!
Hey carolyn-lyra, going to close this issue since it seems like the original logging problem has been resolved and there have been a few update to the libraries since the original report. If you continue to have performance problems, please open a new issue. Thanks!