protobuf
protobuf copied to clipboard
customizible encoder to hide/redact sensitive information
Is your feature request related to a problem? Please describe. We need ability to hide sensitive information during pb to json encoding.
Describe the solution you'd like Customizible encoder that will allow to work with options (see https://go.dev/blog/protobuf-apiv2).
Describe alternatives you've considered Make most of internal parts importable to minimize copy-paste.
Additional context None
Hi ofen,
Sorry for the late reply.
I think to design a generic solution for such a problem is a large task and requires a lot of design work to make sure it works properly for all cases. Unfortunately, I don't think we have the capacity to design and implement such a solution at the moment.
It might be possible for you to implement the Redact
function in the linked proposal yourself and use it in your code base in a wrapper around the protojson.Marshal
.
func marshalToJson(m proto.Message) ([]byte, err) {
// m = proto.Clone(m) // in case you cannot modify `m`
redact(m)
return protojson.Marshal(m)
}
If you do not want to modify the message, you might be able to proto.Clone()
it before redacting the information.
I'm not sure I understand what you mean by the alternatives you described. To which internal parts would you need access to implement this?
Hi, sorry for late reply.
Protobuf redaction with following marshaling performs poorly due to double iteration over nested elements.
Internal part that could be used for redaction functionality, but lacks public interface to modify it's behaviour: https://github.com/protocolbuffers/protobuf-go/blob/b8fc7706010499f46982c883add4351b12e30c0b/encoding/protojson/encode.go#L259
I agree that the double iteration approach is more expensive than if there were hooks directly in the encoder. If performance is critical to you, json might not be the best choice for you anyway (although I don't know your complete setup and the constraints you are working with).
Do you have an idea how large the performance overhead is/would be? Did you do an experiment to find out by how much the performance could be improved if hooks were available? You could do such an experiment by modifying the package locally and running some benchmarks.
While double iteration can be cause performance issues, it is generally fairly fast. Are the messages you are dealing with deeply nested and have many fields? Are there other reasons that make the iteration expensive?
All that said, I want to be honest and I don't think we have the capacity to design this API extension in the near future. However, you are welcome to contribute such a feature. Either post your ideas here or provide a proof of concept change with the extensions you would like to see and we can evaluate this during the review.
@lfolger, Hello, I've came across this issue with same questions as the @ofen. My proposal is to extract the visitFunction and make it default visitFunction for MarshalOptions. So it will look like this:
func WriteJsonField(fd protoreflect.FieldDescriptor, v protoreflect.Value) bool {
// Current logic from line 228
}
type MarshalOptions struct {
pragma.NoUnkeyedLiterals
// all-the-current-options fileds
VisitFn VisitField
}
// Sample usage
func writeAndClear(fd protoreflect.FieldDescriptor, v protoreflect.Value) bool {
clear(fd, v)
return WriteJsonField(fd, v)
}
opt := protojson.MarshalOptions{VisitFn: writeAndClear}
If it looks good enough I can make a PR with the implementation.
However if it is too open realisation or too customisable, I can figure out something else
Would writeAndClear
even work without mutating the original? Or otherwise, wouldn’t writeAndClear
clear the value and then output it? So “clearAndWrite`?
The clearAndWrite
sounds better, didn't think about it :)
About the mutating the original object, I think it is up to library user to mutate the values so general usage would look like
clonedMessage := myMessage.clone()
opt := protojson.MarshalOptions{VisitFn: clearAndWrite}
logger.logStructured(opt.Marshall(clonedMessage)
@Olex1313 this would require us to exposse the encoder because it is used in the code 228+.
This is a quite large change and I don't think we can do this right now without sufficient evidence that it has a significant impact.
Again, if you can share some pprof profiles or benchmarks that show the difference, it would help us to evaluate such a change.
@lfolger, you are right, I'm writing clone-based solution right now, I will provide benchmarks when I finish to see if it is really that necessary, but for repeated fields or maps I think it would be great to have a cheap masking option
@Olex1313 have you started the project? мне тоже нужно)
@ofen, sorry for late answer, been quiet busy. Sample gist with proto clearup: https://gist.github.com/Olex1313/88ff52b1f4cad1af15c6733b4553b4cb As for benchmarks I'll post it today evening, after switching from work to personal laptop, right now can say that clearing takes the same time as the marshalling, where for both the most time-consuming is the iterating over proto fields
Hi @Olex1313 do you mind to share the progress?
Recently i'm looking for a solution of this issue. I also try to do some research and update. If you guys don't mind i can try to update, but need your help to review it.
My solution is that:
- adding
HideSensitiveMessage bool
into MarshalOption struct - updating the RangeFields into:
order.RangeFields(fields, order.IndexNameFieldOrder, func(fd protoreflect.FieldDescriptor, v protoreflect.Value) bool {
name := fd.JSONName()
if e.opts.UseProtoNames {
name = fd.TextName()
}
if fd.IsSensitiveField() && e.opts.HideSensitiveMessage {
return true
}
if err = e.WriteName(name); err != nil {
return false
}
if err = e.marshalValue(v, fd); err != nil {
return false
}
return true
})
- this step in on_progress, adding new tag
is_sensitive_field
for the proto file
please let me know if this update is not suitable or can have issue. cc @puellanivis
@Mahes2, I can not share the benchmarks (did it for work), but as the @puellanivis said it's very complex design issue in protobuff serialization, and this doesn't go well with the purpose of Proto to JSON, because it works only in one way, but obviously you can't convert json->proto afterwards.
I've shared a gist https://gist.github.com/Olex1313/88ff52b1f4cad1af15c6733b4553b4cb, it contains sample approach to clearing the message for logs/etc.
For implementation in https://github.com/golang/protobuf I think it should be a third-party library, but not the part of it
I see. but is it ok if i give contribution to add new tag and using the approach i have given above?
For implementation in https://github.com/golang/protobuf I think it should be a third-party library, but not the part of it
why it should be a 3rd party library?
@Mahes2
I see. but is it ok if i give contribution to add new tag and using the approach i have given above?
It's up to maintainers, if they are ok with such approach I would like to help w code/review
why it should be a 3rd party library?
As @lfolger said upwards, designing custom deserialization api is out of context of encoder api, because it provides a simple way of ser/des of proto objects. Hiding sensible data is not the single feature that should be provided, and api should be extendable for more than that.
My personal opinion that if you really have to hide data in serialized protos you can even use a encoding/json package with json:"-"
annotation, or clear proto message and there are more ways how it can be implemented. The main question is should this really be part of golang/protobuf package, because it is kinda out of context and more specific usage.
I think implementing redaction logic inside protojson serialiser is bad idea, but it still good idea to have some internal parts accessible/public to let developers build their custom logic upon it.
I see. but is it ok if i give contribution to add new tag and using the approach i have given above?
Unfortunately, this is—as I said in your first issue—not going to be accepted. This package needs to stick fairly strictly to the protobuf standard, and this feature is definitely something that we don’t want to walk into. (Even adding the json
tag itself is a well-known mis-step of the package, because it was just so easy to add them, and get JSON, right? And once protobuf actually established a proper JSON mapping, it was entirely incompatible with the new JSON mapping.)
Hi @puellanivis can you share me any doc or blog that has explaination about the protobuf standard? because i still don't know which standard that you are talking about.
Also, is the proper JSON mapping already in progress?
Finally, i thank you guys for giving the tips on how to handle the sensitive message. I did a little update with @Olex1313's suggestion and add it to my libs.
I have made a PR and it's also included with the benchmark Really appreciate it if you guys @puellanivis @Olex1313 @ofen @lfolger can help me review it as well. thank you
https://github.com/Mahes2/go-libs/pull/9/files
https://protobuf.dev/programming-guides/proto3/ is the proto3 standard.
The proper JSON mapping is already complete, and done. It was done for v1 API with jsonpb
and for the v2 API in protojson
.