protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

internal/encoding/json/decode_number accepts string values that contain a comma

Open MarTango opened this issue 10 months ago • 5 comments

What version of protobuf and what language are you using? Version: master, Go

What did you do?

With the parseNumber function:

func main() {
	n, ok := parseNumber([]byte("1234,anything can come after the comma"))
	if ok {
		panic(fmt.Sprintf("should not be ok, %d", n))
	}
}

panics with should not be ok, 4, indicating that it parsed "1234,anything can come after the comma" into the integer 1234, even though this should really throw an error.

What did you expect to see?

ok == false

What did you see instead?

ok == true

Anything else we should know about your project / environment? No

MarTango avatar Feb 10 '25 19:02 MarTango

The parseNumber function is internal to the package and only parses a JSON number out of the front of the input. Is this somehow resulting in a bug in the observable API for protojson?

dsnet avatar Feb 10 '25 21:02 dsnet

Yes, this propagates up to protojson level. For example if I have a protobuf

message HasInt {
  int64 value = 1;
}

I can do

v := &HasInt{}
protojson.Unmarshal([]byte(`{"value": "1,lol"}`, v)

which will succeed, and unmarshal such that v.Value == 1

MarTango avatar Feb 10 '25 22:02 MarTango

Thanks, I recommend next time filing the buggy observed behavior rather than pointing at a particular internal function. The behavior of parseNumber is correct. I suspect the problem is actually parseNumberParts, which should probably reject trailing data since it's only called from the context of a standalone token.

dsnet avatar Feb 10 '25 22:02 dsnet

The same behavior is observed if the string contains spaces:

v := &HasInt{}
protojson.Unmarshal([]byte(`{"value": "1 2"}`, v)

I think the problem might be in the unmarshalInt(), unmarshalUint() and unmarshalFloat() methods in protobuf/encoding/protojson/decode.go. They all create a new decoder to decode the string value in case tok.Kind() == json.String, but dec.Read() doesn't necessarily read the entire string.

One possible fix is to add a second invocation to Read() to detect EOF:

tok1, err := dec.Read()
if err != nil || tok1.Kind() != json.EOF {
    return protoreflect.Value{}, false
}

kitami1976 avatar Jun 18 '25 19:06 kitami1976

🤔 I’m wondering if there would be some sort of a dec.More() function?

puellanivis avatar Jun 20 '25 15:06 puellanivis