TypeScript icon indicating copy to clipboard operation
TypeScript copied to clipboard

Don't escape valid Unicode characters in strings

Open sonacy opened this issue 5 years ago • 7 comments

TypeScript Version: 3.7.4

Code

const sf = createSourceFile(
  'aaa',
  'const a: string = "哈哈"',
  ScriptTarget.Latest
)
// try to do sth in transfrom.
const result = transform(sf, [])
const printer = createPrinter()
const printed = printer.printNode(
  EmitHint.SourceFile,
  result.transformed[0],
  sf
)
console.log(printed)

Expected behavior: const a: string = "哈哈"

Actual behavior: const a: string = "\u54C8\u54C8";

I am trying to use compiler api to do some transform. but the Printer seems could not generate the decoded unicode characters. wonder how to do this right?

sonacy avatar Jan 14 '20 03:01 sonacy

i am seeing the api here.

const realPath = path.resolve(__dirname, './utf8.ts')
const program = createProgram([realPath], {
  target: ScriptTarget.ES2017,
  module: ModuleKind.ES2015,
  allowJs: true,
  jsx: JsxEmit.Preserve,
})
// use it, got expected answer
// program.getTypeChecker()
const result = transform(sf, [])
const printer = createPrinter()
const printed = printer.printNode(
  EmitHint.SourceFile,
  result.transformed[0],
  sf
)
console.log(printed)

same here, use the program api, the file content is basic: 'const a: string = "哈哈"'. but got result: const a: string = "\u54C8\u54C8"; but when i use: program.getTypeChecker(), i got expected answer like: const a: string = "哈哈". wonder why this happens?

sonacy avatar Jan 14 '20 03:01 sonacy

It's not that you're doing anything wrong - our implementation just escapes any characters outside of the printable range of ASCII characters. Nowadays e might be equipped to do a little better given that we have the set of valid unicode identifier characters.

Is there a reason this emit is a problem for you?

DanielRosenwasser avatar Jan 14 '20 07:01 DanielRosenwasser

characters

we use the transform api to deal our source code, for example

const a:string = '哈哈' => const a: string = i18n('哈哈'), so we can search our codebase to replace all the chinese string to use i18n, but if typescript escapes any characters outside of the printable range of ASCII characters, our code base will be wired

is there any solutions let me keep my chinese string, thanks

GilbertSun avatar Jan 14 '20 09:01 GilbertSun

I don't think we should escape these unless there's some hard necessity.

RyanCavanaugh avatar Jan 14 '20 20:01 RyanCavanaugh

No, it was strictly ease of implementation at the time. I'm marking this as Difficult because any contribution needs very thorough test code.

DanielRosenwasser avatar Jan 14 '20 22:01 DanielRosenwasser

Hitting same issue. Our workaround:

    let content = printer.printFile(file);
    content = unescape(content.replace(/\\u/g, "%u"));

git9am avatar Mar 11 '20 10:03 git9am

backlog since 2020

image

Grawl avatar Feb 24 '24 15:02 Grawl

Backlog = PRs accepted, be the change you want to see in the world 😇

RyanCavanaugh avatar Feb 25 '24 06:02 RyanCavanaugh

I'm now using recast to workaround this issue

import ts from "typescript";
import { parse, print, types } from "recast";

const output = ts.transpileModule("`你好`", {});
console.log("typescript output:\n", output.outputText);

let ast = parse(output.outputText);

types.visit(ast, {
  visitLiteral(path) {
    const node = path.node;

    if (typeof node.value === "string") {
      path.replace(types.builders.stringLiteral(node.value));
    }

    this.traverse(path);
  },
});

console.log("recast output:\n", print(ast).code);

outputs

typescript output:
 "\u4F60\u597D";

recast output:
 "你好";

KevinWang15 avatar Jun 18 '24 04:06 KevinWang15