marked icon indicating copy to clipboard operation
marked copied to clipboard

Async marked

Open UziTech opened this issue 2 years ago • 10 comments

Description

see https://github.com/markedjs/marked/pull/2474#issuecomment-1173305134

~~I tried again to make an async version of marked this time coming from the perspective of trying to make all of marked async instead of an option to make it async. I duplicated files into a /src/async directory that need to be changed to async.~~

~~Users would have to explicitly import the async version of marked to use it.~~

import { marked } from 'marked/async';

Pros

  • marked (sync) stays just as fast as it is since none of the code is touched.
  • ~~Any function can be async~~
    • ~~Tokenizer~~
    • ~~Renderer~~
    • walkTokens
    • ~~highligh~~

Cons

  • ~~Lots of duplicate code~~

    • ~~basically maintaining two of almost the same versions of marked~~
  • Fixes #458

Contributor

  • [ ] Test(s) exist to ensure functionality and minimize regression

Committer

In most cases, this should be a different person than the contributor.

UziTech avatar May 22 '22 04:05 UziTech

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
marked-website ✅ Ready (Inspect) Visit Preview Aug 25, 2022 at 2:01AM (UTC)

vercel[bot] avatar May 22 '22 04:05 vercel[bot]

@calculuschild @styfle @joshbruce @davisjam what do you think about this approach? Is it too much to maintain?

We could probably move more of the duplicate code to helpers.

UziTech avatar May 22 '22 17:05 UziTech

@jimmywarting - Saw you had a PR about async and curious if you have thoughts.

@UziTech - Is the extra maintenance from this approach because there's the non-async and then the async here?

Is this a fairly standard approach for the JS community? (Making sure the Marked community can easily understand and update.)

joshbruce avatar May 22 '22 17:05 joshbruce

Saw you had a PR about async and curious if you have thoughts.

it's a pity that we need to have two version of marked... one being async and the other sync. either way, i think i might prefer to only use the async b/c i'm not that concerned about the performances b/c i would only use it on the client side and process one small chunk of markdown that would resolve pretty quickly, but i could understand if this would have to run on a server and be pretty fast to process lots of markdown data at once

The way i did it in my PR was in my opinion more stream friendly with iterators that yields chunks bits at a time. so you could pretty much write something like this to get out data faster so you can start see a result quicker

for (let chunk of iterable) {
  if (typeof chunk !== string) chunk = await chunk
  res.write(chunk)
}
res.end()

iterable have the posibility to be turned into a stream as well using node:stream.Readable.from(iterable) and possible also for whatwg streams in the feature if it lands ReadableStream.from(iterable)

with my PR we could keep much of the logic the same. We could also add Symbol.iterator to easily get a hold of a iterator the yields chunks. And if the user which to use async stuff then they could just use for await so that it would call Symbol.asyncIterator instead of Symbol.iterator and promises would be resolved instead. so that you can write this instead

for await (let chunk of iterable) res.write(chunk)
res.end()

jimmywarting avatar May 22 '22 18:05 jimmywarting

@joshbruce ya the reason for the extra maintenance is because we have to maintain files that are almost the same just not quite. For instance anytime we fix a bug in the tokenizer we will have to fix it in two files.

The async code is a little under half the speed of the sync version (bench sync 5000ms, bench async 11000ms)

@jimmywarting the iterable approach doesn't work for renderers that return false to fall back to original renderers. If that feature didn't exist the async code could be much easier.

UziTech avatar May 22 '22 23:05 UziTech

we have to maintain files that are almost the same just not quite. For instance anytime we fix a bug in the tokenizer we will have to fix it in two files.

That sounds like it could be prone to error, especially for someone who is a new contributor to the project.

We could probably move more of the duplicate code to helpers.

That sounds desirable since it would help reduce the chance of needing to modify two files to fix a single bug.

Fixes https://github.com/markedjs/marked/issues/458

I'm still a little skeptical that this change is needed at all. It seems like the "fetch image from CMS" use case can be solved with the current lexer and parser. Something like this:


const tokens = marked.lexer(md);

for (const token of tokens) {
  if (token.type === 'image') {
    // mutate token
  }
}

const html = marked.parser(tokens);

styfle avatar May 23 '22 00:05 styfle

I'm still a little skeptical that this change is needed at all. It seems like the "fetch image from CMS" use case can be solved with the current lexer and parser.

I hadn't thought of that approach. That actually seems better. Perhaps all that we have to make async is the walkTokens function.

UziTech avatar May 23 '22 20:05 UziTech

after playing around with this a little bit it looks like we could just return the values from walkTokens and await those to do just about anything async.

The async code would look something like:

const tokens = marked.lexer(md);

await Promise.all(marked.walkTokens(tokens, (token) => {
  if (token.type === 'image') {
    return fetch(...)
      .then(res => res.json())
      .then(data => {
        token.href = ...
      });
  }
}));

const html = marked.parser(tokens);

We could do this inside marked to just allow an option. This will only slow down marked if they specify async: true.

const html = await marked(md, {
  async: true,
  walkTokens() {
    if (token.type === 'image') {
      return fetch(...)
        .then(res => res.json())
        .then(data => {
          token.href = ...
        });
    }
  },
})

UziTech avatar May 28 '22 15:05 UziTech

updates?

vixalien avatar Jul 04 '22 01:07 vixalien

@vixalien Thanks for the push! 😉👍I had this almost done for a while.

I changed this to use the suggestion by @styfle in https://github.com/markedjs/marked/pull/2474#issuecomment-1134040343. I made the walkTokens function return the values of the function so someone can await any Promises.

The way I see this working is anyone who wants to do anything async will do it by awaiting marked.walkTokens(tokens, callback)

Something like:

const tokens = marked.lexer(markdown)
await Promise.all(marked.walkTokens(tokens, asynFunction))
const html = marked.parser(tokens)

To make the walkTokens option be able to be async we have to introduce an async option so users can instead write:

const walkTokens = asyncFunction
marked.use({walkTokens, async: true})
const html = await marked.parse(markdown)

or in an extension:

const importUrl = {
  extensions: [{
    name: 'importUrl',
    level: 'block',
    start(src) { return src.indexOf('\n:'); },
    tokenizer(src) {
      const rule = /^:(https?:\/\/.+?):/;
      const match = rule.exec(src);
      if (match) {
        return {
          type: 'importUrl',
          raw: match[0],
          url: match[1],
          html: '' // will be replaced in walkTokens
        };
      }
    },
    renderer(token) {
      return token.html;
    }
  }],
  async: true, // needed to tell marked to return a promise
  async walkTokens(token) {
    if (token.type === 'importUrl') {
      const res = await fetch(token.url);
      token.html = await res.text();
    }
  }
};

marked.use(importUrl);
const markdown = `
# example.com

:https://example.com:
`;

const html = await marked.parse(markdown);

TODO

  • [x] Update docs
  • [x] Add tests

UziTech avatar Jul 04 '22 03:07 UziTech

@UziTech I tried out your promising work and found a potential issue. It seems it breaks link (![]()) handling. I have steps to reproduce below:

  1. Check out https://github.com/gustwindjs/gustwind/tree/bug/async-links
  2. Run the project with deno task start (needs a recent version of Deno)
  3. Head to http://localhost:3000/breezewind/ - Note the links

I have the code at https://github.com/gustwindjs/gustwind/blob/bug/async-links/site/transforms/markdown.ts . The interesting point is that when you remove that marked.use section with the async code, the links get rendered properly again so it probably has to do with the code that was added.

bebraw avatar Aug 14 '22 14:08 bebraw

@UziTech I managed to solve it. I had the following piece of code as per your example:

      start(src: string) {
        return src.indexOf(":");
      },

After removing, it fixed the links. Probably what happened is that it was trying to apply the extension on links since they often have :.

My apologies for the noise. 😄

bebraw avatar Aug 14 '22 14:08 bebraw

@bebraw nice catch. That start function should only look for : at the beginning of a line so it should be something like:

      start(src: string) {
        return src.indexOf("\n:");
      },

(I updated the example code)

UziTech avatar Aug 14 '22 23:08 UziTech

@UziTech Is the start check actually needed? It is there just for optimization purposes? Maybe it would be nice to have short comments at the doc portion related to this (i.e., why something exists). I might also extract importUrl as a const to group it better (nicer for maintenance as it's less brittle when changing things around).

In any case, I think it's a great addition to the API as it allows a lot of imaginative uses. It could be even worth mentioning some use cases at the docs to inspire people.

bebraw avatar Aug 15 '22 05:08 bebraw

@bebraw the start is needed to make sure the paragraph and text tokenizers doesn't consume too much.

hi
:https//example.com:

For example, without the start function the above markdown would be seen as one paragraph since there is no blank line between the lines and it doesn't know to stop at : to check the custom extension.

UziTech avatar Aug 15 '22 13:08 UziTech

@styfle @joshbruce @calculuschild @davisjam Any objections to this?

I think it will help allow a whole new type of async extensions for marked without any breaking changes. The only down side is an added option (async) to prevent anyone not using async from slowing down.

see the added documentation at https://marked-website-git-fork-uzitech-async-marked-markedjs.vercel.app/using_pro#async

UziTech avatar Aug 21 '22 18:08 UziTech

What do you think about adding a parseAsync function to return the promise?

I know Jasmine ran into an issue when adding async expect where people would forget to await the expect call so they changed it to expectAsync for async expect calls so people wouldn't forget to add await before.

We could do something similar. If the async option is set (either by the user or an extension) than any calls to parse would fail with a message like Must use parseAsync for async marked so people don't forget the await.

UziTech avatar Aug 25 '22 02:08 UziTech

Sounds good. Also the PR description no longer matches the content so we should probably update to avoid confusion to readers in the future.

styfle avatar Aug 28 '22 18:08 styfle

After working on the parseAsync way it looks like that will require some other small tradeoffs so we can change it in the future if it is needed.

For now I think this is good to go once you approve @styfle. 😁👍

UziTech avatar Aug 28 '22 22:08 UziTech

:tada: This PR is included in version 4.1.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

github-actions[bot] avatar Aug 30 '22 14:08 github-actions[bot]

yay!

vixalien avatar Aug 31 '22 10:08 vixalien

Thanks for the work on this. I was thinking of using Marked in my Express backend which streaming OpenAI responses. Those responses can be markdown but I'm not sure if this feature would support my use case. My goal is to simulate what OpenAI ChatGPT is doing in their client. I think I have two options as each OpenAI stream arrives.

  1. Use Marked parse on a continually updated string representing the current total of the completion. Change Vue client side code to replace the DOM message's inner HTML with the updated Markdown -> HTML.
  2. Use Marked in some async way that with each part of the stream from OpenAI. Honestly I would have no idea how I would handle these updates on the Vue side because parts will not be guaranteed to represent fully parsed HTML nodes.

So as I say this... I'm thinking #1 is my best option which means I do not need Async Marked, but thought I would put that question out for folks who may have wandered here like I did. Again, thanks for the work!

metaskills avatar Aug 12 '23 01:08 metaskills