swc icon indicating copy to clipboard operation
swc copied to clipboard

parseFileSync span bug

Open JiangWeixian opened this issue 4 years ago • 30 comments

Describe the bug

same file parse twice, different output

Input code

//test.jsx
const a = 1

import { parseFileSync } from '@swc/core';

const ast = parseFileSync('./test.jsx', { jsx: true, syntax: 'ecmascript', comments: true, dynamicImport: true, decorators: true, decoratorsBeforeExport: true, script: true }) const ast1 = parseFileSync('./test.jsx', { jsx: true, syntax: 'ecmascript', comments: true, dynamicImport: true, decorators: true, decoratorsBeforeExport: true, script: true })

console.log(JSON.stringify(ast)) console.log('====') console.log(JSON.stringify(ast1))

{"type":"Module","span":{"start":0,"end":11,"ctxt":0},"body":[{"type":"VariableDeclaration","span":{"start":0,"end":11,"ctxt":0},"kind":"const","declare":false,"declarations":[{"type":"VariableDeclarator","span":{"start":6,"end":11,"ctxt":0},"id":{"type":"Identifier","span":{"start":6,"end":7,"ctxt":0},"value":"a","typeAnnotation":null,"optional":false},"init":{"type":"NumericLiteral","span":{"start":10,"end":11,"ctxt":0},"value":1},"definite":false}]}],"interpreter":null}
====
{"type":"Module","span":{"start":12,"end":23,"ctxt":0},"body":[{"type":"VariableDeclaration","span":{"start":12,"end":23,"ctxt":0},"kind":"const","declare":false,"declarations":[{"type":"VariableDeclarator","span":{"start":18,"end":23,"ctxt":0},"id":{"type":"Identifier","span":{"start":18,"end":19,"ctxt":0},"value":"a","typeAnnotation":null,"optional":false},"init":{"type":"NumericLiteral","span":{"start":22,"end":23,"ctxt":0},"value":1},"definite":false}]}],"interpreter":null}

first ast span start at 0, second start at 12

Config

{
    // Please copy and paste your .swcrc file here
}

Expected behavior A clear and concise description of what you expected to happen.

Version The version of @swc/core:

"@swc/core": "^1.2.46",

Additional context Add any other context about the problem here.

JiangWeixian avatar Jan 28 '21 16:01 JiangWeixian

By design,just change the file name passed into the parseFileSync you'll get the same result.

Brooooooklyn avatar Jan 28 '21 16:01 Brooooooklyn

By design,just change the file name passed into the parseFileSync you'll get the same result.

I think it's a bug

JiangWeixian avatar Jan 29 '21 02:01 JiangWeixian

It's not a bug. If you want span to start with 0, you can create new instance of Compiler, which have same apis.

kdy1 avatar Jan 29 '21 02:01 kdy1

Compiler

@kdy1

const compiler = new Compiler() const ast = compiler.parseFileSync('test.jsx', { syntax: 'ecmascript' }) const compiler1 = new Compiler() const ast1 = compiler1.parseFileSync('test.js', { syntax: 'ecmascript' })

// test.js
const b = 1
// test.jsx
const a = 1
{"type":"Module","span":{"start":0,"end":11,"ctxt":0},"body":[{"type":"VariableDeclaration","span":{"start":0,"end":11,"ctxt":0},"kind":"const","declare":false,"declarations":[{"type":"VariableDeclarator","span":{"start":6,"end":11,"ctxt":0},"id":{"type":"Identifier","span":{"start":6,"end":7,"ctxt":0},"value":"a","typeAnnotation":null,"optional":false},"init":{"type":"NumericLiteral","span":{"start":10,"end":11,"ctxt":0},"value":1},"definite":false}]}],"interpreter":null}
====
{"type":"Module","span":{"start":12,"end":23,"ctxt":0},"body":[{"type":"VariableDeclaration","span":{"start":12,"end":23,"ctxt":0},"kind":"const","declare":false,"declarations":[{"type":"VariableDeclarator","span":{"start":18,"end":23,"ctxt":0},"id":{"type":"Identifier","span":{"start":18,"end":19,"ctxt":0},"value":"b","typeAnnotation":null,"optional":false},"init":{"type":"NumericLiteral","span":{"start":22,"end":23,"ctxt":0},"value":1},"definite":false}]}],"interpreter":null}

second file still start with 12, not working

JiangWeixian avatar Jan 29 '21 02:01 JiangWeixian

Having the same issue. Any known workarounds?

Here's a slightly simpler reproduction:

    const compiler1 = new Compiler();
    console.log(compiler1.parseSync("const a = 1", { syntax: "ecmascript" }));
    const compiler2 = new Compiler();
    console.log(compiler2.parseSync("const b = 1", { syntax: "ecmascript" }));

mikob avatar Apr 15 '21 04:04 mikob

Currently, there's no workaround.

I think api to convert a span from swc to line-col span is required, though.

kdy1 avatar Apr 15 '21 04:04 kdy1

@kdy1 why is line-col needed? Can't \n just be consistently be used for line endings if it's a windows/unix EOL problem?

mikob avatar Jun 13 '21 18:06 mikob

@mikob What do you mean? line-col is about span, not line endings.

kdy1 avatar Jun 14 '21 09:06 kdy1

Right, I just don't see the immediate benefit to using line-col. I thought maybe because you wanted to be explicit about line endings. I use spans to get offsets and extract pieces of the code using .substr and line-col would only make that harder.

mikob avatar Jun 14 '21 15:06 mikob

Ran into this issue too trying to make an adapter for estree AST.

jdalton avatar Aug 17 '21 21:08 jdalton

@jdalton I want to remove parse and print with v2 of swc. I'm going to provide a rust-based plugin API instead.

kdy1 avatar Aug 18 '21 04:08 kdy1

@kdy1

I want to remove parse and print with v2 of swc. I'm going to provide a rust-based plugin API instead.

Ah cool. Will there still be a way to expose the AST? I currently have been able workaround this issue using an adapter to track the offset of the start of the root AST and smooth over the other differences with estree.

jdalton avatar Aug 19 '21 18:08 jdalton

You mean, you want to get actual offset from Span of node side? I think I can provide an API to get line-col span from lo-hi span.

This API, if added, will be removed along with parse/print on v2, but it will work for now.

kdy1 avatar Aug 20 '21 03:08 kdy1

@kdy1 I've noticed some confusion around this from another user. Folks use the AST from parsers, in formats like babylon and estree, to perform source transformations. The AST usually has properties such as .start, .end or maybe a .range array of [start, end] values. In the case of swc it has a .span object with .start and .end properties. These coordinates allow for source to be modified in substrings at the start and end of various nodes. So having AST is required for this but also the AST should have coordinates of the nodes (start and end string positions).

jdalton avatar Aug 20 '21 05:08 jdalton

Most ast nodes already have span. What swc is lacking is an api to convert span to line-col coordinates. Do you mean there should be line-col coordinates?

kdy1 avatar Aug 20 '21 05:08 kdy1

Are we talking past each other? Maybe an example would help. In ast-explorer using acorn with an example of export * as ns from 'a'; the ExportAllDeclaration node has a start of 0 and an end of 24. swc provides these as a span property (almost identical to the values of acorn's range/state/end) but because of #1366 its coordinates are off on subsequent parsings.

jdalton avatar Aug 20 '21 05:08 jdalton

Now I understand it. But I'm not sure about the change. Plugin api / parse / print is misdesigned and going to be removed in v2. In other words, AST will not be exposed to js world.

kdy1 avatar Aug 20 '21 06:08 kdy1

In other words, AST will not be exposed to js world.

If it could still be exposed to JS by a rust plugin or some such way that would be great too.

jdalton avatar Aug 20 '21 14:08 jdalton

Here is the location of the issue.

Every time transform_sync is called, SourceMap#new_source_file is called and the start_pos is set.

If you are using transformSync and transforming one file at at time, there seems to be no way to reset the start pos. And spans use the SourceMap start pos.

#[js_function(4)]
pub fn transform_sync(cx: CallContext) -> napi::Result<JsObject> {
    exec_transform(cx, |c, src, options| {
        Ok(c.cm.new_source_file(
            if options.filename.is_empty() {
                FileName::Anon
            } else {
                FileName::Real(options.filename.clone().into())
            },
            src,
        ))
    })
}
    pub fn new_source_file(&self, filename: FileName, src: String) -> Lrc<SourceFile> {
        // The path is used to determine the directory for loading submodules and
        // include files, so it must be before remapping.
        // Note that filename may not be a valid path, eg it may be `<anon>` etc,
        // but this is okay because the directory determined by `path.pop()` will
        // be empty, so the working directory will be used.
        let unmapped_path = filename.clone();

        let (filename, was_remapped) = match filename {
            FileName::Real(filename) => {
                let (filename, was_remapped) = self.path_mapping.map_prefix(filename);
                (FileName::Real(filename), was_remapped)
            }
            other => (other, false),
        };

        // We hold lock at here to prevent panic
        // If we don't do this, lookup_char_pos and its family **may** panic.
        let mut files = self.files.borrow_mut();

        let start_pos = self.next_start_pos(src.len());

        let source_file = Lrc::new(SourceFile::new(
            filename,
            was_remapped,
            unmapped_path,
            src,
            Pos::from_usize(start_pos),
        ));

Can we get a config option to not set the start_pos from the end of the previous one? It should probably be the default for the public js api.


NOTE: If let start_pos = self.next_start_pos(src.len()); is set to zero on each compile, source maps actually break...the originalLine in the source map is always 1, and the filename is something like jsx-config-pragmaFrag.js.

vjpr avatar Sep 30 '21 01:09 vjpr

For my purposes, I was implementing a parse + visit, and needed spans to get access to comments in the source which SWC doesn't currently include in the AST (in JS at least). I have worked around this for now by adding a visitProgram that captures the node.span.start as an offset, and everywhere I need a span I just transform it to one with the offset subtracted. Works for my purposes, might not for yours.

e.g.

class MyVisitor extends Visitor {
    code: string;
    offset: number;

    constructor(code: string) {
        super();
        this.code = code;
        this.offset = 0;
    }

    visitProgram(node: Program) {
        this.offset = node.span.start;
        return super.visitProgram(node);
    }

    fixSpan(span: Span): Span {
        return {
            start: span.start - this.offset,
            end: span.end - this.offset,
        };
    }

    visitCallExpression(node: CallExpression): Expression {
            const bodySpan = this.fixSpan(node.span);

            // this can now get the correct code for this node with subsequent invocations..
            const nodeBody = this.code.substring(bodySpan.start - 1, bodySpan.end);
   }
}

marcins avatar Aug 19 '22 06:08 marcins

For my purposes, I was implementing a parse + visit, and needed spans to get access to comments in the source which SWC doesn't currently include in the AST (in JS at least). I have worked around this for now by adding a visitProgram that captures the node.span.start as an offset, and everywhere I need a span I just transform it to one with the offset subtracted. Works for my purposes, might not for yours.

e.g.

class MyVisitor extends Visitor {
    code: string;
    offset: number;

    constructor(code: string) {
        super();
        this.code = code;
        this.offset = 0;
    }

    visitProgram(node: Program) {
        this.offset = node.span.start;
        return super.visitProgram(node);
    }

    fixSpan(span: Span): Span {
        return {
            start: span.start - this.offset,
            end: span.end - this.offset,
        };
    }

    visitCallExpression(node: CallExpression): Expression {
            const bodySpan = this.fixSpan(node.span);

            // this can now get the correct code for this node with subsequent invocations..
            const nodeBody = this.code.substring(bodySpan.start - 1, bodySpan.end);
   }
}

This works in many cases, but I found that if the file starts with a comment the offset is off.

sverrejoh avatar Jan 28 '23 19:01 sverrejoh

For my purposes, I was implementing a parse + visit, and needed spans to get access to comments in the source which SWC doesn't currently include in the AST (in JS at least). I have worked around this for now by adding a visitProgram that captures the node.span.start as an offset, and everywhere I need a span I just transform it to one with the offset subtracted. Works for my purposes, might not for yours. e.g.

class MyVisitor extends Visitor {
    code: string;
    offset: number;

    constructor(code: string) {
        super();
        this.code = code;
        this.offset = 0;
    }

    visitProgram(node: Program) {
        this.offset = node.span.start;
        return super.visitProgram(node);
    }

    fixSpan(span: Span): Span {
        return {
            start: span.start - this.offset,
            end: span.end - this.offset,
        };
    }

    visitCallExpression(node: CallExpression): Expression {
            const bodySpan = this.fixSpan(node.span);

            // this can now get the correct code for this node with subsequent invocations..
            const nodeBody = this.code.substring(bodySpan.start - 1, bodySpan.end);
   }
}

This works in many cases, but I found that if the file starts with a comment the offset is off.

This would fix for comments and spaces, we can parse an empty string first to get the initial offset.

   this.offset = parseSync("").span.end

voorjaar avatar Apr 11 '23 05:04 voorjaar

@voorjaar this only works for me if I'm doing parseFileSync after i.e.

export async function parse_files(files: string[]): Promise<Parsed_Module[]> {
    return await Promise.all(
        files.map(async (file) => {
            // @Hack SWC bug - https://github.com/swc-project/swc/issues/1366
            const offset = (parseSync("")).span.end;

            return {
                file,
                span_offset: offset,
                module: parseFileSync(file, parse_options)
            };
        })
    );
}

Instead of

export async function parse_files(files: string[]): Promise<Parsed_Module[]> {
    return await Promise.all(
        files.map(async (file) => {
            // @Hack SWC bug - https://github.com/swc-project/swc/issues/1366
            const offset = (parseSync("")).span.end;

            return {
                file,
                span_offset: offset,
                module: await parseFile(file, parse_options)
            };
        })
    );
}

Which has unacceptable performance :( Any ideas?

DoctorGester avatar Apr 11 '23 16:04 DoctorGester

Not sure if this works. We didn't use the built-in visitor. See https://github.com/macro-plugin/macros

async function hackParseFile (file, options) {
  const offset = (await parse("")).span.end

  return [offset, await parseFile(file, options)]
}

export async function parse_files (files: string[]): Promise<Parsed_Module[]> {
  return await Promise.all(
    files.map(async (file) => {
      // @Hack SWC bug - https://github.com/swc-project/swc/issues/1366
      const [span_offset, module] = await hackParseFile(file, parse_options)

      return {
        file,
        span_offset,
        module
      }
    })
  )
}

voorjaar avatar Apr 20 '23 15:04 voorjaar

We've also at first had problems with using spans (in node/typescript). Apart from taking into account the offset as mentioned in other comments above its also important to keep in mind that the span's start/end refer to byte positions and not char positions. So this was helpful in our context to get the right substrings from SWC's spans:

export class StringAsBytes {
    private string: Uint8Array;
    private decoder: TextDecoder;

    constructor(string: string) {
        this.decoder = new TextDecoder();
        this.string = (new TextEncoder()).encode(string);
    }

    /**
     * Returns a slice of the string by providing byte indices.
     * @param from - Byte index to slice from
     * @param to - Optional byte index to slice to
     */
    public slice(from: number, to?: number): string {
        return this.decoder.decode(
            new DataView(this.string.buffer, from, to !== undefined ? to - from : undefined)
        );
    }
}

Hope this is helpful.

joarfish avatar Jun 05 '23 08:06 joarfish

This bug is kind of deal breaker for switching to swc, In our product we need to invoke parser multiple times in the same process. Increasing span.start with each invocation will make swc stopped working at some point – especially when parsing huge file.

rizrmd avatar Nov 20 '23 14:11 rizrmd

For my case, I need this to convert spans to line-column values in an LSP that I am implementing. Currently, this breaks in swc when I have a file with a comment at the top, because that offsets all of the positions of my spans.

Yash-Singh1 avatar Jan 10 '24 04:01 Yash-Singh1

There is a workaround if you run each parse call in it's own child_process.

import { spawnSync } from 'node:child_process'
import { join } from 'node:path'

import type { Module } from '@swc/core'

const spawnChild = (...args: string[]) => {
  const { stdout } = spawnSync('node', [join(import.meta.dirname, 'child.js'), ...args])
  const ast = JSON.parse(stdout.toString()) as Module

  return ast
}
export const reparse = (source: string): Module => {
  return spawnChild(source)
}

const ast1 = reparse('const foo = "bar"')
const ast2 = reparse('const foo = "bar"')

console.log(ast1.span.start === ast2.span.start) // true

child.js

import { argv, stdout } from 'node:process'
import { parseSync } from '@swc/core'

stdout.write(JSON.stringify(parseSync(argv[2])))

I've published this to npm as @knighted/reparse. It also supports passing a filename.

knightedcodemonkey avatar Jun 01 '24 14:06 knightedcodemonkey