pgx icon indicating copy to clipboard operation
pgx copied to clipboard

MarshalJSON functions suffer from stdlib choice to escape specific HTML characters

Open sammy-hughes opened this issue 6 months ago • 1 comments

Describe the bug

Types which use pgtype types must reimplement JSON serialization logic, to avoid suffering from the stdlib choice to escape specific characters with HTML escape sequences. "the string is encoded using HTMLEscape."

To avoid this escaping, the caller and any intermediate callers must use an encoder, and call the Encoder.SetEscapeHTML method to toggle this behavior. Because marshaling a pgtype type results in calls to json.Marshal, this can only be avoided by reimplementing and/or transforming the values returned.

The latter option, presumptive transformation, creates the possibility of incorrectly mutating the actual values in the opposite manner. This leaves only the option of reimplementing the JSON codec for pgtype types which can otherwise encode any of the 5 characters which are escaped.

To Reproduce Steps to reproduce the behavior:

The following script shows the following:

  1. The default behavior.
  2. The effect of the "opt-out" behavior.
  3. The kind of step effectively required to avoid mutating data, using pgtype strings.
package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"strings"

	"github.com/jackc/pgx/v5/pgtype"
)

type example struct {
	A string
	B pgtype.Text
}

func main() {
	// setup the buffer and encoder needed, later
	buffer := bytes.Buffer{}
	encoder := json.NewEncoder(&buffer)

	// initialize the example value
	v := example{
		A: "<",
		B: pgtype.Text{String: "<", Valid: true},
	}

	// serialize and print with the default options
	err := encoder.Encode(&v)
	s := buffer.String()
	fmt.Printf("Default:\t%v %v\n", s, err)

	// reset to clear
	buffer.Reset()

	// serialize and print with HTML escaping disabled
	encoder.SetEscapeHTML(false)
	encoder.Encode(&v)
	s = buffer.String()
	fmt.Printf("Escaping \"off\":\t%v %v\n", s, err)

	// reset to clear
	buffer.Reset()

	// serialize and mutate the resulting value
	encoder.Encode(&v)
	s = buffer.String()
	s = strings.ReplaceAll(s, "\\u003c", "<")
	fmt.Printf("Mutated value:\t%v %v\n", s, err)
}

If possible, please provide runnable example such as: go playground link: https://go.dev/play/p/XJ0Hmn2Q2cO?v=goprev

package main

import (
	"context"
	"log"
	"os"

	"github.com/jackc/pgx/v5"
)

func main() {
	conn, err := pgx.Connect(context.Background(), os.Getenv("DATABASE_URL"))
	if err != nil {
		log.Fatal(err)
	}
	defer conn.Close(context.Background())

	// Your code here...
}

Please run your example with the race detector enabled. For example, go run -race main.go or go test -race.

Expected behavior

I expect to be able to use the pgtype types, specifically pgtype.Text, without concern that the data being handled will be transformed.

Actual behavior

To quote the justification from the stdlib encoding/json package:

So that the JSON will be safe to embed inside HTML

Version

  • Go: go1.24.2 darwin/arm64
  • PostgreSQL: N/A (only client behaviors are required to reproduce)
  • pgx: v5.7.2

Additional context

It's too late to unf**k the stdlib encoding/json lib, so I recognize I'm asking for a third-party lib to compensate for that.

sammy-hughes avatar May 12 '25 22:05 sammy-hughes

It's too late to unf**k the stdlib encoding/json lib, so I recognize I'm asking for a third-party lib to compensate for that.

Yeah, I don't know any good way to make this configurable or changeable. The json codecs allow custom marshal and unmarshal functions, but I don't know any good way to do that on the type level.

The latter option, presumptive transformation, creates the possibility of incorrectly mutating the actual values in the opposite manner.

There is another option. You could unmarshal the JSON string then immediately marshal the result with your encoder. That should get the desired output without the risk of incorrect mutation.

jackc avatar May 16 '25 23:05 jackc