svelte icon indicating copy to clipboard operation
svelte copied to clipboard

Slow parsing when blanks are continuous

Open ota-meshi opened this issue 1 year ago • 5 comments

Describe the bug

If there are many consecutive blanks, it will be very slow to parse.

I run the following code and found that it took 9000ms.

import { parse } from "svelte/compiler";

const fixture = `<div>${" ".repeat(100000)}</div>`;

const start = Date.now();
parse(fixture);
console.log(Date.now() - start);

Reproduction

  • Open
    https://stackblitz.com/edit/node-4tso3q?file=index.js
  • Run npm test

Running the following code that uses the svelte parser takes 9000ms.

import { parse } from "svelte/compiler";

const fixture = `<div>${" ".repeat(100000)}</div>`;

const start = Date.now();
parse(fixture);
console.log(Date.now() - start);

However, the code for the following template, which has more characters than before, finishes in 50ms.

import { parse } from "svelte/compiler";

const fixture = `<div>${" A".repeat(100000)}</div>`;

const start = Date.now();
parse(fixture);
console.log(Date.now() - start);

I think most of the time for template parses is spent on the next regex.

https://github.com/sveltejs/svelte/blob/ccbf10c16326cffb1b38e721e2f74aad4b85dbf2/src/compiler/parse/index.ts#L37

Logs

No response

System Info

System:
    OS: macOS 12.4
    CPU: (4) x64 Intel(R) Core(TM) i5-6360U CPU @ 2.00GHz
    Memory: 50.58 MB / 8.00 GB
    Shell: 5.8.1 - /bin/zsh
  Binaries:
    Node: 18.4.0 - /usr/local/bin/node
    Yarn: 1.22.11 - /usr/local/bin/yarn
    npm: 8.12.1 - /usr/local/bin/npm
  Browsers:
    Chrome: 103.0.5060.114
    Firefox: 79.0
    Safari: 15.5
  npmPackages:
    svelte: ^3.46.1 => 3.48.0

Severity

annoyance

ota-meshi avatar Jul 11 '22 03:07 ota-meshi

Looks like this might be an issue with catastrophic backtracking, where the regex engine tries to backtrack every single one of those 100k spaces to see if it might match the rest of the regex. So that instead of following all 100k spaces, seeing that they don't match the end of the line (because they have a </div> after them) and backing off the whole 100k, it then tries 100k times to find an end-of-line marker.

I can think of a couple ways to work around the catastrophic backtracking in this case. First, if template was expected to be a single-line string, trimEnd would likely be more performant as it wouldn't backtrack at all. String.trimEnd is not supported on IE, so the code might need to become something like:

if ('trimEnd' in String.prototype) {
  this.template = template.trimEnd();
} else {
  this.template = template.replace(/\s+$/, '');
}

(Unless Svelte plans to drop IE support entirely, in which case the if isn't necessary).

Another approach might be to use the technique described in https://instanceof.me/post/52245507631/regex-emulate-atomic-grouping-with-lookahead to prevent the regex engine from backtracking. Rewrite the /\s+$/ regex as /(?=(\s+))\1$/ and it should (I haven't tested this yet) perform the same in non-pathological cases, yet avoid backtracking in pathological cases. (Because the entire series of spaces will be captured by the parentheses inside the lookahead expression, then when the parser reaches the \1$ marker it's not able to backtrack through the \s+ expression). However, this would need to be tested, as it could possibly cause slowdown in non-pathological cases by creating capture groups for every space found in the template, then throwing those groups away later when it's discovered they weren't needed. (Which would result in GC pressure).

BTW, I traced the origin of the template.replace(/\s+$/, '') line back to https://github.com/sveltejs/svelte/commit/bac02481b781f4d2509a5361b3d4bfdd47482ab2 (from #153), meaning that that line has been in the parser for a long, long time.

rmunn avatar Jul 11 '22 05:07 rmunn

We had to work around this issue as well in language tools by inserting non-whitespacs characters at the end of each line to prevent this from slowing down.

dummdidumm avatar Jul 11 '22 06:07 dummdidumm

This is a parser issue, so if we can run it on Node.js version 8, the minimum supported version of Node.js, I think it's okay if we can't run it in IE.

However, Node.js v8 does not support trimEnd. But Node.js v8 seems to be able to use trimRight. trimRight is currently an alias for trimEnd and is deprecated, so people don't like it very much. Do you think we can use trimRight?

https://node.green/#ES2019-features-string-trimming https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/trimEnd

ota-meshi avatar Jul 11 '22 07:07 ota-meshi

I agree to use trimRight. It seems to be available in the latest Node 18.6.0 also. (So trimRight can use from our minimum supported version Node 8 to the latest version Node18) Also, we can notice by CI if some environment droptrimRight in the future.

baseballyama avatar Jul 17 '22 07:07 baseballyama

We only can use built in functions which are part of the node versions Svelte currently supports

dummdidumm avatar Jul 17 '22 08:07 dummdidumm

The performance should be improved in 3.50.0.

Conduitry avatar Sep 02 '22 12:09 Conduitry