Error when saving if 32-char prefix/suffix of quote selector ends mid-way through a unicode surrogate pair
Steps to reproduce:
- On https://github.com/typescript-eslint/typescript-eslint/discussions/6014, activate the extension and try to annotate exactly the text "no-inferrable-types" in the left column of a table
- Try to save the annotation
Expected result: Annotation is saved Actual result: 500 server error from h
There are two issues here:
-
From inspecting the client's store (set
window.debug = true) we can see that theTextQuoteSelectorof the new annotation contains a string that ends mid-way through a UTF-16 surrogate pair (see thesuffixfield):
-
In h, the
POST /api/annotationsAPI fails when attempting to store the JSON blob in Postgres: https://sentry.io/organizations/hypothesis/issues/2551504044/?project=37293&query=is%3Aunresolved&referrer=issue-stream
InvalidTextRepresentation: invalid input syntax for type json
LINE 1: ...cript-eslint/typescript-eslint/discussions/6014', '[{"type":...
^
DETAIL: Unicode low surrogate must follow a high surrogate.
CONTEXT: JSON data, line 1: ...ud83e\uddf1\n\ud83d\udfe9\n\n\n\n\n\n", "suffix":...
There are two fixes needed here:
- The client shouldn't be submitting selectors with invalid Unicode to the server
- h shouldn't crash with an internal server error. It should either fail with a 4xx error, or silently fix up the invalid Unicode (eg. by ignoring the isolated Unicode high surrogate)
A workaround for users is to change the text selection slightly so that the prefix and suffix end at a slightly different point.
We have some code in https://github.com/hypothesis/client/blob/631372eefbda552476ebb438205231e1f25ad97e/src/sidebar/util/unicode.ts#L27 that shows how to do Unicode-aware truncation of strings.
This type of payloads are now rejected by the backend.
We seem to have a few hundred instandtance os this problem every month (900 last 30 days) so it is worth fixing.