tsgo panics with invalid UTF-8 when marshalling build info
Stack trace
panic: Failed to marshal build info: json: cannot marshal from Go incremental.BuildInfoDiagnosticsOfFile within "/semanticDiagnosticsPerFile/0": invalid UTF-8 within "/1/1/message" after offset 319
goroutine 1 [running]:
github.com/microsoft/typescript-go/internal/incremental.(*Program).emitBuildInfo(0xc002a0c180, {0xdb5728, 0x14ad4e0}, {0x0?, 0xd8?, 0x0?})
github.com/microsoft/typescript-go/internal/incremental/program.go:264 +0x588
github.com/microsoft/typescript-go/internal/incremental.(*emitFilesHandler).emitAllAffectedFiles(0xc000b35590, {0x0?, 0x1e?, 0x0?})
github.com/microsoft/typescript-go/internal/incremental/emitfileshandler.go:132 +0x696
github.com/microsoft/typescript-go/internal/incremental.emitFiles({0xdb5728, 0x14ad4e0}, 0xc002a0c180, {0x0?, 0x0?, 0x0?}, 0x0)
github.com/microsoft/typescript-go/internal/incremental/emitfileshandler.go:273 +0x1a5
github.com/microsoft/typescript-go/internal/incremental.(*Program).Emit(0xc002a0c180, {0xdb5728, 0x14ad4e0}, {0x0?, 0x0?, 0x0?})
github.com/microsoft/typescript-go/internal/incremental/program.go:193 +0x307
github.com/microsoft/typescript-go/internal/execute.emitFilesAndReportErrors({0xdb8450, 0xc000144000}, {0xdbb000, 0xc002a0c180}, 0xc0001285e0)
github.com/microsoft/typescript-go/internal/execute/tsc.go:384 +0x1cf
github.com/microsoft/typescript-go/internal/execute.emitAndReportStatistics({0xdb8450, 0xc000144000}, {0xdbb000?, 0xc002a0c180?}, 0xc000002000, 0xc0001560f0, 0x0?, 0x6325c, 0x299e9ef, 0x7ffd6e, ...)
github.com/microsoft/typescript-go/internal/execute/tsc.go:308 +0x6d
github.com/microsoft/typescript-go/internal/execute.performIncrementalCompilation({0xdb8450, 0xc000144000}, 0xc0001560f0, 0xc0001285e0, 0xc00011e240, 0x6325c, 0x0)
github.com/microsoft/typescript-go/internal/execute/tsc.go:250 +0x40c
github.com/microsoft/typescript-go/internal/execute.tscCompilation({0xdb8450, 0xc000144000}, 0xc000156000, 0x0)
github.com/microsoft/typescript-go/internal/execute/tsc.go:194 +0xb53
github.com/microsoft/typescript-go/internal/execute.CommandLine({0xdb8450, 0xc000144000}, {0xc0000200a0, 0x0, 0x0}, 0x0)
github.com/microsoft/typescript-go/internal/execute/tsc.go:63 +0x17e
main.runMain()
github.com/microsoft/typescript-go/cmd/tsgo/main.go:23 +0x109
main.main()
github.com/microsoft/typescript-go/cmd/tsgo/main.go:10 +0x13
Steps to reproduce
- Create a .ts file with this content:
const createFileListFromFiles = (files: File[]): FileList => {
const fileList: FileList = {
length: files.length,
item: (index: number): File | null => files[index] || null,
[Symbol.iterator]: function* (): IterableIterator<File> {
for (const file of files) yield file;
},
...files,
} as unknown as FileList;
return fileList;
};
- Run:
tsgo
This is a side effect of us switching to json/v2, which will error when trying to send strings that are not UTF-8 safe.
In our case, our internal symbol names begin with an illegal UTF-8 sequence to discriminate them from other names, and it sounds like a diagnostic is attempting to encode that, which we then serialize.
Throwing this in the playground, and yeah:
'length' is specified more than once, so this usage will be overwritten.
'__@iterator@17' is specified more than once, so this usage will be overwritten.
The __@ in tsgo is handled differently. I think we need to sanitize the strings that go into diagnostics, but this is going to be a generic problem for the API as well.
I noticed that ts-go uses the following code to begin internal symbol names: https://github.com/microsoft/typescript-go/blob/72d2ee651371b71ed798ffbb721055b32d926d2a/internal/ast/symbol.go#L30
So we can use strings.ReplaceAll(s, InternalSymbolNamePrefix, "") to remove illegal characters.
Specifically, this applies to both buildInfo.SemanticDiagnosticsPerFile and buildInfo. EmitDiagnosticsPerFile .
https://github.com/microsoft/typescript-go/blob/a57f4e0479b9af0ff7d7a89ded898139977f3227/internal/incremental/buildInfo.go#L430-L431
@jakebailey WDYT? If that sounds okay, I will make a PR for this later.
That's not okay, because they will look like real names. We should likely instead convert them to something more like the old code with double underscores or similar.
But simply replacing them is a band aid for a larger problem, unfortunately.
That's not okay, because they will look like real names. We should likely instead convert them to something more like the old code with double underscores or similar.
But simply replacing them is a band aid for a larger problem, unfortunately.
Ok, do you have any thoughts on how to fix this? I'd be happy to assist.
I'm thinking about it. Maybe for now we just do the hotfix and make sure BuildInfoDiagnostic, buildInfoDiagnosticWithFileName, buildInfoDiagnosticWithFileName, replace ast.InternalSymbolNamePrefix with __.
That or, we put JSON marshalling in compat mode for this one difference in buildinfo, but that won't work for other stuff in the LSP.
we need to handle this part for mashalling as well as unmarshalling otherwise errors wont be consistent.
Yeah, I forgot that we reproduce the errors too ☹️
I have a branch where I've changed the internal prefix to something we can roundtrip, but I'm not 100% certain it's correct because we sometimes turn internal symbol names into string names for types.
Of course, internal symbols ever appearing in public anything are bugs. (In the API, that's okay, but not so much elsewhere.)
We also have some internal symbols that use a = prefix, which might imply we could get away with any non-identifier string, which could make my PR correct enough.