protobuf
                                
                                 protobuf copied to clipboard
                                
                                    protobuf copied to clipboard
                            
                            
                            
                        proto: annotate errors regarding invalid UTF-8 with the relevant field
Is your feature request related to a problem? Please describe.
string field contains invalid UTF-8 errors while marshaling using the new Go proto library google.golang.org/protobuf, but it's too difficult to debug since one doesn't know which field has the problem.
Describe the solution you'd like Add the field name to the error message. Something like:
string field "foo" contains invalid UTF-8
Describe alternatives you've considered Looks like the issue is related to directly converting from a slice of bytes to a string.
string([]byte)
If that's the case, one could potentially triage that, but the field name would much more helpfull.
Additional context
This is using the new golang google.golang.org/protobuf module at v1.25.0 at least.
Annotating errors with this information seems reasonable. If we do this or marshal, we presumably want to do it for unmarshal as well.
At least for unmarshal, we removed context from most errors since it was often misleading as to what the exact problem is. For example, unmarshaling random garbage would often produce an error saying that some specific field was the issue, when really the entire input was bad. This occurs because the protobuf wire format has no magic number. For invalid UTF-8, it seems unlikely to encounter this error due to random data since the data would first have to pass the length-prefix check.
Another concern I don't know the answer to is what the performance cost of this is. There's cost in allocating a new error value that contains the field information, and there may be cost passing the field information up/down the call stack.
\cc @neild.
I think it makes sense to include the field name for marshal/unmarshal errors caused by invalid UTF-8. It's been somewhere on my to-do list; just hasn't risen to the top of it yet.
I'm not worried much about the cost of allocating a new error in this case, but we do want to be sure that there isn't any additional cost paid by other errors or the non-error path.
Yeah, seems like the costs of allocating the new error would be limited by the number of messages that have utf8 errors in them, which unless you’re getting hit with a bunch of bad actors, seems unlikely to be as big of a concern. (But it has been known to accidentally turn quadradic before… 😬 )
There are almost certainly going to be better attack vectors on making unmarshal operations expensive than triggering an error allocation, so I'm not too worried about that case. (As long as we don't make it ridiculously expensive by accident, of course.)
+1
Any chances we might see some action on this :)?
+1
+1
@neild @dsnet since we have in Go error.As and error.Is and error.Wrap/Unwrap it seems now to be possible to do this. I observed these errors similar to other referenced issue in opentracing (and OTel).