marked Measured performance decrease between major releases of Marked, time for a performance pass?

What pain point are you perceiving? Since MarkedJS's inception most major versions have increased time needed to complete parsing. The numbers shown below were captured on second page load /w 6x CPU slowdown - only Marked version being revised, nothing else. Marked processing time has increased more than 2x since v0.8.2

Version 0.8.2
Version 2.0.3
Version 4.0.9

Describe the solution you'd like Perhaps it's time to look at an optimization pass and potential short circuits that could be taken to restore some previously lost performance? I feel performance should remain somewhat consistent...

Jan 12 '22 00:01 alystair

It would be great to get something in the pipeline to measure performance. We tried that at one point but it wasn't reliable when run in GitHub actions.

PRs are appreciated 😁👍

Jan 12 '22 02:01 UziTech

Well, it's not something that has to be done often so this could be a manual operation by generating flamegraphs and attempting to optimize the functions that 'waste' the most time - just looking at the output above the lexer is a good place to start.

Jan 12 '22 03:01 alystair

What are you using to produce those flame graphs?

Jan 12 '22 14:01 UziTech

Chrome DevTools via "Performance" tab recordings

Jan 14 '22 07:01 alystair

Is there perhaps a good standard test Markdown document that would give Marked.js a thorough performance stress test? I know we have our bench script, but that uses only our spec tests from Commonmark if I recall, which is heavily weighted toward specs with lots of odd edge cases (i.e., I think almost a fourth of our tests are for em/strong, which might disproportionately make it seem like em/strong is a bottleneck), and no tests at all for tables or other GFM features that might be contributing to slowdown as well,

Things like this tend to have large numbers of code blocks, etc., which is a similar problem, but at least it's a normal document that can be pasted into Chrome if we use their performance tab.

Jan 26 '22 18:01 calculuschild

Safari also has some interesting tooling but DevTools should be sufficient. You can roll your own perf using the Performance API however it of course may add a few precious ns/ms to runtime ;)

Jan 28 '22 01:01 alystair

@UziTech Since we run each spec 1000 times in our bench test, would it be possible to modify our bench tests to track the average time taken for each spec and output which cases are slowing us down the most?

Feb 26 '22 02:02 calculuschild

Ya we could bench each test individually to see which ones are slowest relative to other packages.

Feb 26 '22 02:02 UziTech

@UziTech I put together a rough version of this, but realized analyzing becomes very confusing if we are comparing 6 versions/options of Marked against Commonmark and Markdown-It. Given that, does it make sense to compare just the CJS (no GFM) version of Marked with Commonmark? If so I can make a PR.

Essentially for each spec, I found the ratio of execution times MarkedTime / CommonmarkTime and sorted those results top to bottom. The order changes a bit with each run, so I combined 4 runs in Excel to get the average performance of each spec.

The ranking is as follows (top 100 worst specs):

Key

Spec # - The commonmark spec example number. Avg. Performance Ratio - How many times slower Marked (cjs) is than Commonmark Avg. Rank - Position in the sorting order Section - Commonmark spec category

Note that example 523 is consistently the worst performing (first place almost every time), however in general Em/Strong seems to be most prominent at the top.

Spec #	Avg. Performance Ratio	Avg. Rank	Section
523	4.791666667	1.5	Links
450	4.1875	2.25	Emphasis and strong emphasis
438	4.125	2.75	Emphasis and strong emphasis
255	3.516666667	5.75	List items
279	3.776785714	5.75	List items
435	3.4375	6.75	Emphasis and strong emphasis
291	3.268181818	11.75	List items
447	3.25	12	Emphasis and strong emphasis
644	3.1875	12.5	Hard line breaks
16	3.25	12.75	Backslash escapes
66	3.338888889	13	ATX headings
645	3.083333333	13.5	Hard line breaks
629	3.1875	14.25	Raw HTML
631	3.05	16.75	Raw HTML
535	3.1	17	Links
385	3.066666667	18	Emphasis and strong emphasis
280	3.102678571	18	List items
630	3.0625	18.75	Raw HTML
627	3	20.5	Raw HTML
444	2.892857143	26.25	Emphasis and strong emphasis
342	2.875	27.25	Code spans
416	2.822916667	28.5	Emphasis and strong emphasis
650	2.9125	31.5	Textual content
611	2.75	34.25	Autolinks
614	2.791666667	35.25	Raw HTML
456	2.897321429	35.5	Emphasis and strong emphasis
256	2.694444444	38	List items
525	2.672222222	38.5	Links
59	2.725	39	Thematic breaks
322	2.69047619	39	Lists
603	2.647222222	39.5	Autolinks
387	2.75	40.25	Emphasis and strong emphasis
71	2.607638889	42.75	ATX headings
399	2.738095238	42.75	Emphasis and strong emphasis
485	2.625	44.75	Links
168	2.608333333	45.25	HTML blocks
628	2.8	48.25	Raw HTML
610	2.854166667	49.5	Autolinks
192	2.5625	50	Link reference definitions
359	2.55	52.75	Emphasis and strong emphasis
363	2.533333333	55.75	Emphasis and strong emphasis
108	2.565909091	56.75	Indented code blocks
488	2.553571429	58.25	Links
386	2.595238095	58.75	Emphasis and strong emphasis
75	2.5625	60	ATX headings
56	2.529761905	60.25	Thematic breaks
498	2.619047619	61.5	Links
298	2.488888889	61.75	List items
262	2.490909091	62	List items
391	2.488095238	63.75	Emphasis and strong emphasis
491	2.458333333	65.5	Links
616	2.45	66.25	Raw HTML
446	2.446428571	66.25	Emphasis and strong emphasis
65	2.6875	66.25	ATX headings
277	2.490909091	66.75	List items
519	2.4375	68	Links
397	2.517857143	69	Emphasis and strong emphasis
187	2.433333333	71.25	HTML blocks
276	2.45	72	List items
475	2.45	74.25	Emphasis and strong emphasis
400	2.4375	74.75	Emphasis and strong emphasis
436	2.441666667	75.25	Emphasis and strong emphasis
624	2.4	75.5	Raw HTML
453	2.416666667	76.25	Emphasis and strong emphasis
49	2.5	80.25	Thematic breaks
434	2.410714286	81	Emphasis and strong emphasis
420	2.410714286	82	Emphasis and strong emphasis
634	2.4	82.25	Hard line breaks
652	2.375	83.5	Textual content
63	2.4375	84.75	ATX headings
323	2.389423077	85.5	Lists
415	2.383928571	86.5	Emphasis and strong emphasis
325	2.376633987	87.5	Lists
580	2.35	91.5	Images
484	2.383333333	92.5	Links
258	2.371590909	94.25	List items
439	2.341666667	94.75	Emphasis and strong emphasis
476	2.458333333	95.5	Emphasis and strong emphasis
637	2.35	95.75	Hard line breaks
74	2.4375	97.25	ATX headings
109	2.338450292	97.25	Indented code blocks
266	2.375	99.25	List items
281	2.343406593	100	List items
448	2.333333333	101.25	Emphasis and strong emphasis
299	2.315705128	101.5	List items
452	2.366666667	102.5	Emphasis and strong emphasis
440	2.333333333	102.5	Emphasis and strong emphasis
518	2.3	102.75	Links
254	2.335742754	102.75	List items
458	2.345238095	103.5	Emphasis and strong emphasis
437	2.333333333	104	Emphasis and strong emphasis
284	2.3125	104	List items
216	2.322802198	104.25	Link reference definitions
301	2.335539216	105.75	Lists
270	2.329924242	106.25	List items
268	2.365079365	106.25	List items
102	2.357142857	108.25	Setext headings
307	2.296153846	110.5	Lists
390	2.321428571	111	Emphasis and strong emphasis
29	2.3	111.25	Entity and numeric character references

Feb 27 '22 18:02 calculuschild

Brilliant research - do the majority of these use regex? My vague hypothesis is that some regexes could be optimized to reduce the number of steps needed for matching. Using a 3rd party tool such as https://regex101.com/ (although it only works for Python/PHP) you can tell the number of steps it had to take - there's probably an equiv for JS somewhere?

Feb 28 '22 20:02 alystair

Regex101 works fine. Even though the debugger is "only for PHP" they are nearly identical so it works well enough for tracking things down.

Regex is always a possible source of improvement, and you are right, pretty much every token uses Regex at it's core. I can already see a possible improvement in the rDelim rule for emStrong.

Feel free to poke around if you notice any potential improvements.

Feb 28 '22 20:02 calculuschild

Wish I had the time! My personal priority is taking Marked (and other renderers such as highlight.js) out of the client critical path in my upcoming library by caching results server side where possible.

Mar 01 '22 21:03 alystair

marked marked copied to clipboard

Measured performance decrease between major releases of Marked, time for a performance pass?

Key

marked
marked copied to clipboard