pgx
pgx copied to clipboard
MarshalJSON functions suffer from stdlib choice to escape specific HTML characters
Describe the bug
Types which use pgtype types must reimplement JSON serialization logic, to avoid suffering from the stdlib choice to escape specific characters with HTML escape sequences. "the string is encoded using HTMLEscape."
To avoid this escaping, the caller and any intermediate callers must use an encoder, and call the Encoder.SetEscapeHTML method to toggle this behavior. Because marshaling a pgtype type results in calls to json.Marshal, this can only be avoided by reimplementing and/or transforming the values returned.
The latter option, presumptive transformation, creates the possibility of incorrectly mutating the actual values in the opposite manner. This leaves only the option of reimplementing the JSON codec for pgtype types which can otherwise encode any of the 5 characters which are escaped.
To Reproduce Steps to reproduce the behavior:
The following script shows the following:
- The default behavior.
- The effect of the "opt-out" behavior.
- The kind of step effectively required to avoid mutating data, using pgtype strings.
package main
import (
"bytes"
"encoding/json"
"fmt"
"strings"
"github.com/jackc/pgx/v5/pgtype"
)
type example struct {
A string
B pgtype.Text
}
func main() {
// setup the buffer and encoder needed, later
buffer := bytes.Buffer{}
encoder := json.NewEncoder(&buffer)
// initialize the example value
v := example{
A: "<",
B: pgtype.Text{String: "<", Valid: true},
}
// serialize and print with the default options
err := encoder.Encode(&v)
s := buffer.String()
fmt.Printf("Default:\t%v %v\n", s, err)
// reset to clear
buffer.Reset()
// serialize and print with HTML escaping disabled
encoder.SetEscapeHTML(false)
encoder.Encode(&v)
s = buffer.String()
fmt.Printf("Escaping \"off\":\t%v %v\n", s, err)
// reset to clear
buffer.Reset()
// serialize and mutate the resulting value
encoder.Encode(&v)
s = buffer.String()
s = strings.ReplaceAll(s, "\\u003c", "<")
fmt.Printf("Mutated value:\t%v %v\n", s, err)
}
If possible, please provide runnable example such as: go playground link: https://go.dev/play/p/XJ0Hmn2Q2cO?v=goprev
package main
import (
"context"
"log"
"os"
"github.com/jackc/pgx/v5"
)
func main() {
conn, err := pgx.Connect(context.Background(), os.Getenv("DATABASE_URL"))
if err != nil {
log.Fatal(err)
}
defer conn.Close(context.Background())
// Your code here...
}
Please run your example with the race detector enabled. For example, go run -race main.go or go test -race.
Expected behavior
I expect to be able to use the pgtype types, specifically pgtype.Text, without concern that the data being handled will be transformed.
Actual behavior
To quote the justification from the stdlib encoding/json package:
So that the JSON will be safe to embed inside HTML
Version
- Go: go1.24.2 darwin/arm64
- PostgreSQL: N/A (only client behaviors are required to reproduce)
- pgx: v5.7.2
Additional context
It's too late to unf**k the stdlib encoding/json lib, so I recognize I'm asking for a third-party lib to compensate for that.
It's too late to unf**k the stdlib encoding/json lib, so I recognize I'm asking for a third-party lib to compensate for that.
Yeah, I don't know any good way to make this configurable or changeable. The json codecs allow custom marshal and unmarshal functions, but I don't know any good way to do that on the type level.
The latter option, presumptive transformation, creates the possibility of incorrectly mutating the actual values in the opposite manner.
There is another option. You could unmarshal the JSON string then immediately marshal the result with your encoder. That should get the desired output without the risk of incorrect mutation.