pgtype icon indicating copy to clipboard operation
pgtype copied to clipboard

Unexpected behavior of pgtype.JSONBArray and json.RawMessage

Open EmilLaursen opened this issue 2 years ago • 4 comments

I am setting a pgtype.JSONBArray value from a value of type []json.RawMessage, and encountered some unexpected behavior. I am not sure if this is a bug.

This first test fails:

func TestPgxArraySet1(t *testing.T) {
	data := []json.RawMessage{
		json.RawMessage(`{"lol":1}`),
		json.RawMessage(`{"lol":2}`),
	}
	var nas pgtype.JSONBArray
	err := nas.Set(data)
	if err != nil {
		t.Fatal(err)
	}
	assert.Len(t, nas.Elements, len(data))
	if len(nas.Elements) != len(data) {
		t.Errorf("data length: %v, nas length: %v", len(data), len(nas.Elements))
	}
}

What happens is that data is interpreted as an array of single integer values, one integer for each byte in the two json values. So nas.Elements has length 18. But changing one of the JSON values a bit will make the test pass:

func TestPgxArraySet2(t *testing.T) {
	data := []json.RawMessage{
		json.RawMessage(`{"lol":2}`),
		json.RawMessage(`{"lol":{"x":1}}`),
	}
	var nas pgtype.JSONBArray
	err := nas.Set(data)
	if err != nil {
		t.Fatal(err)
	}
	assert.Len(t, nas.Elements, len(data))
	if len(nas.Elements) != len(data) {
		t.Errorf("data length: %v, nas length: %v", len(data), len(nas.Elements))
	}
}

This is with github.com/jackc/pgtype v1.8.1 and go 1.17. Casting the type to [][]byte makes everything work just fine. I assume this is because [][]byte is handled by .Set specifically, where []json.RawMessage will fall through to the default reflection based conversion in .Set.

Would it be possible to add a case for []json.RawMessage in .Set?

EmilLaursen avatar Mar 09 '22 11:03 EmilLaursen

Would it be possible to add a case for []json.RawMessage in .Set?

If that is the solution then it should be possible to add it to typed_array_gen.sh and regenerate the array struct.

jackc avatar Mar 12 '22 00:03 jackc

As mentioned in the PR, your suggestion will fix the test cases for json.RawMessage. However, the fundamental issue remains, in the sense that defining any other type alias for []byte such as

type somebytealias []byte

and substituting json.RawMessage with this type, will make the test cases fail once again. So it appears there might be a problem in the reflect based case in .Set?

I am happy to help investigate this, but I do not feel I understand the ramifications of tinkering with the typed_array.go.erb template, as this is used for several types. I have tried to debug these test cases:

type somebytealias []byte

func TestJSONBArraySetAlias(t *testing.T) {
	successfulTests := []struct {
		source interface{}
		result pgtype.JSONBArray
	}{
		{source: []somebytealias{somebytealias(`{"foo":12}`), somebytealias(`{"bar":2}`)}, result: pgtype.JSONBArray{
			Elements:   []pgtype.JSONB{pgtype.JSONB{Bytes: []byte(`{"foo":1}`), Status: pgtype.Present}, pgtype.JSONB{Bytes: []byte(`{"bar":2}`), Status: pgtype.Present}},
			Dimensions: []pgtype.ArrayDimension{pgtype.ArrayDimension{Length: 2, LowerBound: 1}},
			Status:     pgtype.Present,
		}},
		{source: []somebytealias{somebytealias(`{"foo":1}`), somebytealias(`{"bar":{"x":2}}`)}, result: pgtype.JSONBArray{
			Elements:   []pgtype.JSONB{pgtype.JSONB{Bytes: []byte(`{"foo":1}`), Status: pgtype.Present}, pgtype.JSONB{Bytes: []byte(`{"bar":{"x":2}}`), Status: pgtype.Present}},
			Dimensions: []pgtype.ArrayDimension{pgtype.ArrayDimension{Length: 2, LowerBound: 1}},
			Status:     pgtype.Present,
		}},
		{source: []somebytealias{somebytealias(`{"foo":1}`), somebytealias(`{"bar":2}`)}, result: pgtype.JSONBArray{
			Elements:   []pgtype.JSONB{pgtype.JSONB{Bytes: []byte(`{"foo":1}`), Status: pgtype.Present}, pgtype.JSONB{Bytes: []byte(`{"bar":2}`), Status: pgtype.Present}},
			Dimensions: []pgtype.ArrayDimension{pgtype.ArrayDimension{Length: 2, LowerBound: 1}},
			Status:     pgtype.Present,
		}},
	}

	for i, tt := range successfulTests {
		var d pgtype.JSONBArray
		err := d.Set(tt.source)
		if err != nil {
			t.Errorf("%d: %v", i, err)
		}

		if !reflect.DeepEqual(d, tt.result) {
			var xs []string
			for _, e := range d.Elements {
				xs = append(xs, string(e.Bytes))
			}
			t.Errorf("%d: expected %+v to convert to %+v, but it was %+v, raw: %v", i, tt.source, tt.result, d, xs)
		}
	}
}

And at least I can share some observations. The two first cases are different from the third.

  1. In the third case, the two elements {"foo":1} and {"bar":2} both have length 9, and what is set on dst.Elements is
[123 34 102 111 111 34 58 49 125 123 34 98 97 114 34 58 50 125]

which is the concatenation of the two inputs.

  1. Changing 1 to 12, which is the first case, gives us this output
["eyJmb28iOjEyfQ==" "eyJiYXIiOjJ9"]

which is the base64 encoding of the two inputs as strings. This is probably due to json.Marshal being used in the pgtype.JSON type at some point when calling dst.Elements[index].Set in the setRecursive code, as this is the default behavior on []byte types. The same happens in the second case.

EmilLaursen avatar Mar 20 '22 14:03 EmilLaursen

The code that tries to automatically handle unknown types is hard to reason about. The JSON code path is even harder because it automatically calls json.Marshal with anything that isn't explicitly recognized.

That conflicts with the try to figure out if the unknown type can be converted to something known. For example:

type mytype []byte

func (s mytype) MarshalJSON() ([]byte, error) {
	// ...
}

Should mytype be marshaled or should it be treated as a []byte? And how to handle that while not breaking any of the other convertible to JSON types.

JSONBArray gets even more complicated because there is the extra layer of the array.

Personally, I'm inclined to be content with the partial solution in your PR rather than delve into that nest of complexity. But your welcome to dig into it further if you want.

FWIW, tricky issues like this were one of the primary motivations behind deciding to start development on pgx v5. I'm hoping to simplify the structure of the type conversion system such that this type issue isn't so complicated.

jackc avatar Mar 22 '22 00:03 jackc

I think I would be hard pressed to make much progress here :) Seeing as you are working on pgx v5, I think it is better to look for the future. I am curious to look into the new type conversion code.

EmilLaursen avatar Mar 24 '22 09:03 EmilLaursen