protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

proto: annotate errors regarding invalid UTF-8 with the relevant field

Open nicolasparada opened this issue 5 years ago • 13 comments

Is your feature request related to a problem? Please describe. string field contains invalid UTF-8 errors while marshaling using the new Go proto library google.golang.org/protobuf, but it's too difficult to debug since one doesn't know which field has the problem.

Describe the solution you'd like Add the field name to the error message. Something like:

string field "foo" contains invalid UTF-8

Describe alternatives you've considered Looks like the issue is related to directly converting from a slice of bytes to a string.

string([]byte)

If that's the case, one could potentially triage that, but the field name would much more helpfull.

Additional context This is using the new golang google.golang.org/protobuf module at v1.25.0 at least.

nicolasparada avatar Oct 22 '20 18:10 nicolasparada

Annotating errors with this information seems reasonable. If we do this or marshal, we presumably want to do it for unmarshal as well.

At least for unmarshal, we removed context from most errors since it was often misleading as to what the exact problem is. For example, unmarshaling random garbage would often produce an error saying that some specific field was the issue, when really the entire input was bad. This occurs because the protobuf wire format has no magic number. For invalid UTF-8, it seems unlikely to encounter this error due to random data since the data would first have to pass the length-prefix check.

Another concern I don't know the answer to is what the performance cost of this is. There's cost in allocating a new error value that contains the field information, and there may be cost passing the field information up/down the call stack.

\cc @neild.

dsnet avatar Oct 22 '20 22:10 dsnet

I think it makes sense to include the field name for marshal/unmarshal errors caused by invalid UTF-8. It's been somewhere on my to-do list; just hasn't risen to the top of it yet.

I'm not worried much about the cost of allocating a new error in this case, but we do want to be sure that there isn't any additional cost paid by other errors or the non-error path.

neild avatar Oct 22 '20 22:10 neild

Yeah, seems like the costs of allocating the new error would be limited by the number of messages that have utf8 errors in them, which unless you’re getting hit with a bunch of bad actors, seems unlikely to be as big of a concern. (But it has been known to accidentally turn quadradic before… 😬 )

puellanivis avatar Oct 23 '20 00:10 puellanivis

There are almost certainly going to be better attack vectors on making unmarshal operations expensive than triggering an error allocation, so I'm not too worried about that case. (As long as we don't make it ridiculously expensive by accident, of course.)

neild avatar Oct 23 '20 01:10 neild

+1

buzzlightila avatar Jan 21 '21 22:01 buzzlightila

Any chances we might see some action on this :)?

awartoft avatar May 20 '21 01:05 awartoft

+1

RedCollarPanda avatar Jul 21 '21 07:07 RedCollarPanda

+1

andrii482 avatar Sep 22 '21 09:09 andrii482

@neild @dsnet since we have in Go error.As and error.Is and error.Wrap/Unwrap it seems now to be possible to do this. I observed these errors similar to other referenced issue in opentracing (and OTel).

szuecs avatar Jan 03 '23 19:01 szuecs