docusaurus icon indicating copy to clipboard operation
docusaurus copied to clipboard

Performance - Reduce build time and memory usage

Open slorber opened this issue 4 years ago • 129 comments

💥 Proposal

With Webpack 5 support, re-build times are now faster.

But we still need to improve the time for the first build which is not so good currently.

Some tools to explore:

  • https://github.com/evanw/esbuild
  • https://github.com/swc-project/swc
  • https://github.com/alangpierce/sucrase

It will be hard to decouple Docusaurus totally from Webpack at this point.

But we should at least provide a way for users to use an alternative (non-Babel) JS loader that could be faster and good enough. Docusaurus core should be able to provide a few alternate loaders that would work by default using the theme classic, by just switching a config flag.

If successful and faster, we could make one of those alternate loader the default loader for new sites (when no custom babel config is found in the project).

Existing PR by @SamChou19815 for esbuild: https://github.com/facebook/docusaurus/pull/4532

slorber avatar May 11 '21 10:05 slorber

For anyone interested, we added the ability to customize the jsLoader here https://github.com/facebook/docusaurus/pull/4766

This gives the opportunity to replace babel by esbuild, and you can add this in your config:

  webpack: {
    jsLoader: (isServer) => ({
      loader: require.resolve('esbuild-loader'),
      options: {
        loader: 'tsx',
        format: isServer ? 'cjs' : undefined,
        target: isServer ? 'node12' : 'es2017',
      },
    }),
  },

We don't document it yet (apart here). We may recommend it later for larger sites if it proves to be successful according to feedback from early adopters, so please let us know if that works for your use-case.

Important notes:

  • Docusaurus.io builds with esbuild and the above config
  • browser support, syntax and polyfills might be a little bit different, this is not a 1-1 replacement (https://github.com/privatenumber/esbuild-loader/discussions/170)
  • esbuild does not use browserslist config, you are responsible to provide the right target value (https://github.com/evanw/esbuild/issues/121)
  • browser support seems good enough with es2017, and Docusaurus theme works in a lot of recent browsers
  • use a tool like Browserstack to test browser support (it's easy to get a free account for open-source projects)
  • eventually, use polyfill.io for even older browsers when some DOM APIs are unsupported?

slorber avatar May 14 '21 09:05 slorber

came from https://github.com/facebook/docusaurus/issues/4785#issuecomment-860705444.

Just wondering, is this issue aiming to reduce build time for entire site generator (including md/mdx parser, or Docs), or just jsx react pages?

adventure-yunfei avatar Jun 15 '21 06:06 adventure-yunfei

@adventure-yunfei md docs are compiled to React components with MDX, and the alternative js loader like esbuild also process the output of the MDX compiler, so this applies to documentation as well. Check the mdx playground to test the mdx compiler: https://mdxjs.com/playground/

If you have 10k docs you basically need to transpile 10k react components

slorber avatar Jun 15 '21 08:06 slorber

@slorber perfect! we're also trying to use esbuild to boost build time (for application project). I'll have a try on this.

BTW I've created a similar large doc site here from our internal project.

Update: Tested with a higher perf PC:

  • with Docusaurus 2.0.0-beta.0, doc site generation finished in 63min
  • with latest in-dev version, doc site generation finished in 30min. Reduced 50% time. 👍

adventure-yunfei avatar Jun 15 '21 09:06 adventure-yunfei

This gave a nice performance boost, although I think there are still more to be desired. Out of curiosity, what is actually going on that is taking so much time behind the scenes? In our case (a site with around 2,000 .md(x) files) most of the time spent seems to be before and after the "Compiling Client/Compiling Server" progress bars appear and complete.

As it stands, building the site takes around 20 minutes with esbuild, and was closer to 40 minutes before. Then out of curiosity, I just tested to add four versions to our site, and building it. Before using esbuild, the process took just shy of 13 hours(!). Using esbuild it was down to just shy of 8 hours. (Still way too long to be acceptable). So while it was a big improvement, it still seems to be very slow.

In the second case, it reported:

[success] [webpackbar] Client: Compiled successfully in 1.33h
[success] [webpackbar] Server: Compiled successfully in 1.36h

What was going on for the remaining 5 hours? Is this normal behavior, or did we configure something incredibly wrong? And why does it take much longer than four times the amount of time with four versions added?

alphaleonis avatar Jun 23 '21 06:06 alphaleonis

@alphaleonis it's hard to say without further analysis, but the MDX compiler is transforming each md doc to a React component, that is later processed by babel (or esbuild).

The MDX compiler might be a bottleneck, this is why I'd like to provide an alternate MD parser for cases where MDX is not really needed.

Webpack might also be a bottleneck.

Using esbuild is not enough. Also when using esbuild as a webpack loader, we are not really leveraging the full speed benefits of esbuild. Unfortunately we can't replace replace Webpack by esbuild easily, it's part of our plugin lifecycle API and Webpack is more featured than esbuild (we use various things like file-loader, svgr loader...)

What was going on for the remaining 5 hours? Is this normal behavior, or did we configure something incredibly wrong?

We have enabled Webpack 5 has persistent caching, and rebuild times are much faster. You need to persist node_modules/.cache across build to leverage it.

And why does it take much longer than four times the amount of time with four versions added?

It's hard to tell without measuring on your system. Your system may have not enough memory for Webpack to do its job efficiently, leading to more garbage collection or whatever.

slorber avatar Jun 23 '21 10:06 slorber

@slorber Thanks for the explanation. We did try the persistent caching, and it seems to help a lot with the time spent during the "Build server/client" phase (which I assume is Webpack). The machine in question had 16GB memory, and the same was specified as max_old_space_size.

Is there any way we can do some further analysis, such as enabling some verbose logging to get some more details perhaps? Or is this kind of the expected build time for sites of that size? (If so I guess we will have to find another solution for versioning, such as building/deploying each version separately.

alphaleonis avatar Jun 23 '21 18:06 alphaleonis

Also when using esbuild as a webpack loader, we are not really leveraging the full speed benefits of esbuild

This is true - but there's still a speed benefit to take advantage of. It's also pretty plug and play to make use of. See my post here:

https://blog.logrocket.com/webpack-or-esbuild-why-not-both/

johnnyreilly avatar Jun 23 '21 19:06 johnnyreilly

Is there any way we can do some further analysis, such as enabling some verbose logging to get some more details perhaps?

This is a Webpack-based app, and the plugin system enables you to tweak the Webpack config to your needs (configureWebpack lifecycle) and add logs or whatever you want that can help troubleshoot the system. You can also modify your local docusaurus and add tracing code if you need.

I'm not an expert in Webpack performance debugging so I can't help you much on how to configure webpack and what to measure exactly, you'll have to figure out yourself for now.

Or is this kind of the expected build time for sites of that size?

It's hard to have meaningful benchmarks. Number of docs is one factor but also the size of docs obviously matter so one site is not strictly comparable to another. 40min build time for 2000 mdx docs with babel seems expected when comparing with other sites. Obviously it's too much and we should aim to reduce that build time, but it's probably not an easy thing to do.

(If so I guess we will have to find another solution for versioning, such as building/deploying each version separately.

For large sites, it's definitively the way to go, and is something I'd like to document/encourage more in the future. It's only useful to keep multiple versions in master when you actively update them. Once a version becomes unmaintained, you should rather move it to a branch and create a standalone immutable deployment for it, so that your build time does not increase as time passes and your version number increase.

We have made it possible to include "after items" in the version dropdown, so that you can include external links to older versions, and we use it on the Docusaurus site itself:

image

I also want to have a "docusaurus archive" command to support this workflow better, giving the ability to publish a standalone version of an existing site and then remove that version.

slorber avatar Jun 24 '21 10:06 slorber

Tested with a higher perf PC:

  • with Docusaurus 2.0.0-beta.0, doc site generation finished in 63min
  • with latest in-dev version, doc site generation finished in 30min. Reduced 50% time. 👍

Saddly the process costs a very large memory. My local testing environment has 32G memory, but in CICD environment memory limit is 20G. The process is killed cause of OOM, during emitting phase. From the monitor, the memory suddenly increased from 8G to 20G+.

adventure-yunfei avatar Jun 26 '21 06:06 adventure-yunfei

It is unexpected that beta.2 is faster than beta.0, maybe you didn't clear your cache?

The process is killed cause of OOM, during emitting phase. From the monitor, the memory suddenly increased from 8G to 20G+.

What do you mean by the "emitting phase"? I didn't take much time to investigate all this so any info can be useful.

slorber avatar Jun 26 '21 17:06 slorber

It is unexpected that beta.2 is faster than beta.0, maybe you didn't clear your cache?

I'm using the esbuild-loader config from the docusaurus website example. So it should be esbuild making build faster.

What do you mean by the "emitting phase"? I didn't take much time to investigate all this so any info can be useful.

This may not be accurate. The process memory was 7G most times. About 20 minutes later memory jumped to 20.2G while the console showing Client "emitting". After client build finished, the memory dropped down to 7G. (The Server was still building)

adventure-yunfei avatar Jun 27 '21 13:06 adventure-yunfei

Trying to test esbuild-loader but running into some trouble.

I have added the following to the top level of my docusaurus.config.js file:

  webpack: {
    jsLoader: (isServer) => ({
      loader: require.resolve('esbuild-loader'),
      options: {
        loader: 'tsx',
        format: isServer ? 'cjs' : undefined,
        target: isServer ? 'node12' : 'es2017',
      },
    }),
  },

I have added the following to my dependencies in package.json:

    "esbuild-loader": "2.13.1",

The install of esbuild-loader fails. Am I missing more dependencies for this to work? Might also be a Windows problem, unsure right now.

krillboi avatar Jun 29 '21 08:06 krillboi

Seems like it was one of the good ol' corporate proxy issues giving me the install troubles..

I'll try and test the esbuild-loader to see how much faster it is for me.

krillboi avatar Jun 29 '21 12:06 krillboi

Tested yesterday with production build, took about 3 hours compared to 6 hours before (~400 docs x 5 versions x 4 languages).

So about half the time with the esbuild-loader which is nice. But we are reaching a size of docs where I am now looking into archiving older versions as seen on the Docusaurus site.

This may not be accurate. The process memory was 7G most times. About 20 minutes later memory jumped to 20.2G while the console showing Client "emitting". After client build finished, the memory dropped down to 7G. (The Server was still building)

I witnessed the same thing where the memory usage would suddenly spike up to take 25+ gb.

krillboi avatar Jun 30 '21 11:06 krillboi

Thanks for highlighting that, we'll try to figure out why it takes so much memory suddenly

slorber avatar Jun 30 '21 13:06 slorber

Not 100% related but I expect this PR to improve perf (smaller output) and reduce build time for sites with very large sidebars: https://github.com/facebook/docusaurus/pull/5136 (can't really tell by how much though, it's site specific so please let me know if you see a significant improvement)

slorber avatar Jul 15 '21 13:07 slorber

Not 100% related but I expect this PR to improve perf (smaller output) and reduce build time for sites with very large sidebars: #5136 (can't really tell by how much though, it's site specific so please let me know if you see a significant improvement)

Tested my application with latest dev version.

  • Max memory usage: 21G
  • Build time: 34min

Seems not working for my case.

Update:

  • site sizes descreased a little bit. 115M -> 104M.

adventure-yunfei avatar Jul 20 '21 07:07 adventure-yunfei

This may not be accurate. The process memory was 7G most times. About 20 minutes later memory jumped to 20.2G while the console showing Client "emitting". After client build finished, the memory dropped down to 7G. (The Server was still building)

I've made another test, using plugin to override .md loader with noop:

// inside docusaurus.config.js
{
  // ...
  plugins: [
    function myPlugin() {
      return {
        configureWebpack() {
          return {
            module: {
              rules: [
                {
                  test: /\.mdx?$/,
                  include: /.*/,
                  use: {
                    loader: require('path').resolve(__dirname, './scripts/my-md-loader.js')
                  }
                }
              ]
            }
          }
        }
      };
    }
  ],
}
// scripts/my-md-loader.js
module.exports = function myPlugin() {
    const callback = this.async();
    return callback && callback(null, 'empty...');
};

And then run doc builder again.

  • build time: 17min
  • max memory: 20+G

So I'm afraid it's the code of page wrapper (e.g. top bar, side navigation, ...) that causes the max memory usage. Switching mdx-loader to another one may won't help.

adventure-yunfei avatar Aug 31 '21 05:08 adventure-yunfei

@adventure-yunfei it's not clear to me how to do those measures, can you explain?

If you allow Docusaurus to take up to 20go, it may end up taking 20go. And it may take more if you give it more. The question is, how much can you reduce the max_old_space_size nodejs setting until it starts crashing due to OOM.

So I'm afraid it's the code of page wrapper (e.g. top bar, side navigation, ...) that causes the max memory usage. Switching mdx-loader to another one may won't help.

Proving a memory issue is not the mdx-loader does not mean it's the "page wrapper". There is much more involved than the React server-side rendering here.

I suspect there are optimizations that can be done in this webpack plugin's fork we use: https://github.com/slorber/static-site-generator-webpack-plugin/blob/master/index.js

Gatsby used it initially and replaced it with some task queueing system.

slorber avatar Aug 31 '21 07:08 slorber

Proving a memory issue is not the mdx-loader does not mean it's the "page wrapper". There is much more involved than the React server-side rendering here.

That's true. By saying "page wrapper" I mean any other code outside the md page content itself. Just trying to provide more perf information to help identify the problem.

More info:

  • max_old_space_size was set as 4096.
  • memory was monitored by windows perf monitor, with following result: image
  • the console stuck here for a long time: image

adventure-yunfei avatar Sep 01 '21 10:09 adventure-yunfei

When the number of documents is large,run yarn start is still slow,do you have a plan to support vite?

hjiog avatar Nov 09 '21 10:11 hjiog

The bundler is a bit hard to swap out. Next.js afaik is going through the same struggle, but basically the entire core infra is coupled with Webpack, so the most we can do is using different JS loaders (esbuild vs Babel) rather than letting you use an entirely different bundler. If you have the energy... you can try forking Docusaurus and re-implementing core with Vite.

Josh-Cena avatar Nov 09 '21 10:11 Josh-Cena

There is some interest in making Docusaurus bundler & framework-agnostic in the future through an adapter layer but it's likely to be complex to implement in practice, and our current plugin ecosystem is also relying to Webpack so it would be a disruptive breaking change for the community.

slorber avatar Nov 09 '21 17:11 slorber

Makes me wonder if it's possible to swap out Webpack in our core entirely🚎 As Docusaurus 3.0, rebuilt with Vite/Parcel/...

Josh-Cena avatar Nov 09 '21 23:11 Josh-Cena

@slorber i did some big refactoring, small optimization and removed bunch of dependencies oo static-site-generator-webpack-plugin - https://github.com/slorber/static-site-generator-webpack-plugin/pull/2 and https://github.com/slorber/static-site-generator-webpack-plugin/pull/1

i trimmed down package but there is still bunch of improvements to be done there


we should generally avoid using as this is "extremely" slow

const webpackStatsJson = webpackStats.toJson({ all: false, assets: true }, true);

https://github.com/webpack/webpack/issues/12102 https://github.com/webpack/webpack/issues/6083


next potentially slow / high resource consuming is package eval, this code spawns vm for each entry point and evaluates its code

armano2 avatar Nov 21 '21 10:11 armano2

I've found and fixed the large maximum memory issue in https://github.com/slorber/static-site-generator-webpack-plugin/pull/3.

Investigation

After investigation, the max memory happened during static-site-generator-webpack-plugin, rendering for every page path. So I took a look at static-site-generator-webpack-plugin code, and found two problems:

  1. memory/gc issue. All pages are rendered during the same time (see code). The render is async, and its allocated resource cannot be freed util the render promise finished. Thus in the worst case, the maximum allocated memory should be sum of resources for rendering every page, that is O(M*N) memory which M for page count and N for allocated memory size for rendering one page. That's not necessary.
  2. duplicate rendering. After rendering one page, it'll crawl relative paths and continue render that relative path/page (see code). That may lead to many duplicate page renderings.

To fix them:

  • Using a promise queue, to render only partial pages in the same time, wait util finished, and then continue render next partial pages. Then we have maximum O(S*N) memory, which S for queue size, and N for allocated memory size for rendering one page.
  • Record rendered pages, and skip rendering if duplicate.

Check the optimization result below

Before optimization

  • output logs (with some custom injections):
    [en] Creating an optimized production build...
    Fri Dec 03 2021 10:58:31 GMT+0800 (GMT+08:00): start compiling.
    i Compiling Client
    i Compiling Server
    Fri Dec 03 2021 11:07:15 GMT+0800 (GMT+08:00) start StaticSiteGeneratorWebpackPlugin. paths count: 7722
    Fri Dec 03 2021 11:07:19 GMT+0800 (GMT+08:00) after evaluate source. source size: 64794924
    Fri Dec 03 2021 11:21:24 GMT+0800 (GMT+08:00) renderPaths finished
    √ Client: Compiled successfully in 23.66m
    √ Server: Compiled successfully in 23.70m
    Fri Dec 03 2021 11:22:56 GMT+0800 (GMT+08:00): compile finished.
    Fri Dec 03 2021 11:22:56 GMT+0800 (GMT+08:00): start post build, plugins:
        docusaurus-plugin-content-docs, docusaurus-plugin-content-blog,
        docusaurus-plugin-content-pages, docusaurus-plugin-sitemap,
        docusaurus-theme-classic, docusaurus-bootstrap-plugin, docusaurus-mdx-fallback-plugin
    Fri Dec 03 2021 11:22:56 GMT+0800 (GMT+08:00): post build finished
    Fri Dec 03 2021 11:22:56 GMT+0800 (GMT+08:00): start handleBrokenLinks
    info Docusaurus found broken links!
    ...
    Fri Dec 03 2021 11:31:20 GMT+0800 (GMT+08:00): handleBrokenLinks finished
    Success! Generated static files in "build".
    
  • memory records: before

The maximum allocated memory is 21+G, increased quickly during static-site-generator-webpack-plugin renderPaths, and then dropped down quickly (from 11:07 to 11:21).

After optimization

  • output logs:
    [en] Creating an optimized production build...
    Fri Dec 03 2021 16:15:06 GMT+0800 (GMT+08:00): start compiling.
    i Compiling Client
    i Compiling Server
    Fri Dec 03 2021 16:25:00 GMT+0800 (GMT+08:00) start StaticSiteGeneratorWebpackPlugin. paths count: 7722
    Fri Dec 03 2021 16:25:04 GMT+0800 (GMT+08:00) after evaluate source. source size: 64794924
    Fri Dec 03 2021 16:40:11 GMT+0800 (GMT+08:00) renderPaths finished
    √ Client: Compiled successfully in 25.70m
    √ Server: Compiled successfully in 25.83m
    Fri Dec 03 2021 16:42:00 GMT+0800 (GMT+08:00): compile finished.
    Fri Dec 03 2021 16:42:00 GMT+0800 (GMT+08:00): start post build, plugins: docusaurus-plugin-content-docs, docusaurus-plugin-content-blog, docusaurus-plugin-content-pages, docusaurus-plugin-sitemap, docusaurus-theme-classic, docusaurus-bootstrap-plugin, docusaurus-mdx-fallback-plugin
    Fri Dec 03 2021 16:42:00 GMT+0800 (GMT+08:00): post build finished
    Fri Dec 03 2021 16:42:00 GMT+0800 (GMT+08:00): start handleBrokenLinks
    info Docusaurus found broken links!
    ...
    Fri Dec 03 2021 16:50:52 GMT+0800 (GMT+08:00): handleBrokenLinks finished
    Success! Generated static files in "build".
    
  • memory records: after

The maximum allocated memory is 7.1G during static-site-generator-webpack-plugin renderPaths (from 16:25 to 16:40), without large memory allocated. (and maximum 8.2G for the whole build, happens during docusaurus core handleBrokenLinks)).

The maximum memory decreased, while the build time remained the same.

Further information

  1. Build time summary, total 35min:
    • webpack compile: 27min
      • static-site-generator-webpack-plugin renderPaths: 15min
    • handleBrokenLinks: 8min
  2. The render page function is defined in docusaurus core serverEntry.ts. After checking the code:
    • I guess the large memory allocating comes from minifier. I've seen minifier consuming large memory before.
    • manifest is read & parsed multiple times (see code). We can optimize it to only read once. In my case, manifest json is 2MB, reading & parsing for 7722 times costs 2min.
  3. In my case only large memory allocating issue is validated (@krillboi please help to validate in your case). There's no relative path in my case, so perf result for "avoid duplicate page rendering" is not validated. @alphaleonis you may test it in your case. From your description I suspect your non-linear build time increasing is caused by this code.

adventure-yunfei avatar Dec 03 '21 09:12 adventure-yunfei

Tip: one of the best ways to reduce build time and memory is to use esbuild-loader instead of babel-loader. See the website in this repo's config for the setup and use.

RDIL avatar Dec 07 '21 18:12 RDIL

Thanks for working on this, will read all that more carefully and review PRs soon.

FYI afaik Gatsby also moved to a queuing system a while ago and that was something I wanted to explore. It's worth comparing our code to theirs.


Something I discovered recently: JS can communicate more seamlessly with Rust thanks to napi_rs with some shared memory, while it's more complicated in Go.

image

https://twitter.com/sebastienlorber/status/1460624240579915785 https://twitter.com/sebastienlorber/status/1468522862990536709

It's really worth trying to use SWC instead of esbuild with the Babel loader for that reason. I believe it may be faster than esbuild when used as a loader, while esbuild may be faster when you go all-in and stop using Webpack/loaders.

Next.js has great results with SWC, and we may eventually be able to later leverage their Rust extensions to support things like Styled-Components/Emotion. https://nextjs.org/blog/next-12#faster-builds-and-fast-refresh-with-rust-compiler

If someone wants to make a POC PR on our own website and compare build times with cold caches, that could be interesting

slorber avatar Dec 09 '21 10:12 slorber

@slorber I have a question that I'm unable to figure out: if our site is using esbuild, why is there still a Babel message in the command line saying that the changelog has exceeded 500KB?

Josh-Cena avatar Dec 09 '21 12:12 Josh-Cena

Personally, after using SWC and ESBuild for a while, I honestly prefer ESBuild. SWC is not documented nearly as much, and ESBuild has very frequent releases fixing bugs and adding features. ESBuild has a nicer DX IMO.

RDIL avatar Dec 09 '21 12:12 RDIL

swc and webpack are planned smoothly integration so if you want to avoid big changes/refactor/unpredictable bugs, prefer to use swc (part of integration - https://github.com/swc-project/swc/tree/main/crates/swc_webpack_ast), also swc has better perf in some cases, but they both are fast.

swc is parser/codegen/visitor/bundler/transpiler/etc stuff, it is more just bundler, these are slightly different things, so if in the future you want deeper native integration, especial based on rust, I recommend to use swc. This is not an advertisement, just notes for other developers.

alexander-akait avatar Dec 10 '21 13:12 alexander-akait

ESBuild and swc's performance difference should be very tiny, given how, especially compared to JS tools, they are both much faster. I don't really think its worth comparing the 2 for performance, since both are clearly very fast.

If one provides a better experience than the other, is it really worth benchmarking them on a base of like half a second?

RDIL avatar Dec 10 '21 14:12 RDIL

Please read:

also swc has better perf in some cases, but they both are fast.

I am more about - swc provides more things out of box, so if you need custom plugin/transformer/code generator for js/css/and more things, I strongly recommend to use swc, bundling is not only one thing in build pipeline

alexander-akait avatar Dec 10 '21 14:12 alexander-akait

Thanks @alexander-akait, we'll try to keep up with the work Vercel and Webpack are doing and see what we can reuse here.

SWC is more extensible and we may even implement some Rust plugin someday to process our i18n API and replace the global registry of messages with some inline localized translation strings directly in the app bundle. (ie, better code splitting for translations).

And it should also make it easier to use Emotion/StyledComponents with Docusaurus, as Vercel is already working on porting existing Babel plugins to Rust.

@slorber I have a question that I'm unable to figure out: if our site is using esbuild, why is there still a Babel message in the command line saying that the changelog has exceeded 500KB?

😅 good question, maybe it's related to the translation extraction? But afaik it's not run when starting the dev server... weird

slorber avatar Dec 10 '21 18:12 slorber

A quick guess is that MDX v1 uses Babel under the hood to do the transformation: https://github.com/mdx-js/mdx/blob/master/packages/mdx/mdx-hast-to-jsx.js#L1

It seems MDX v2 has removed this dependency

Josh-Cena avatar Dec 12 '21 12:12 Josh-Cena

I've been using Next.js a fair bit this year and honestly if I could turn back time, I think @endiliey and I shouldn't have built our own site generator in Docusaurus v2 and we should have used Next.js instead. Admittedly, it was a mix of not-invented-here syndrome and wanting to learn how to build a site generator from scratch.

At this point, Next.js is a clear winner in the SSG race and Gatsby is more or less out. Vercel is doing so well with their latest funding rounds and star hires, I think it's safe to bet on Next.js.

Docusaurus v2 is split into 3 layers: our homegrown (1) SSG infra, (2) plugins, (3) UI/themes. If I were to build Docusaurus v3, I would make it such that Docusaurus 3 is more like Nextra, swap out (1) with Next.js and retain (2) and (3). Docusaurus 3 would provide all the documentation-related features. I felt that Docusaurus 2 had to play catch up a lot and implement lots of non-documentation-specific features that were required by websites when Next.js already provided all these. We could have saved lots of time by standing on the shoulders of giants.

With Next.js' current popularity and trajectory, I think it's only a matter of time before someone builds a fully-fledged docs theme on top of Next.js that does everything that Docusaurus does, but probably better because their SSG infra is much more optimized by virtue of being on Next.js. IMO many users would also like to have the SSR features Next.js provides so that they can build auth have better integration with their core product.

yangshun avatar Jan 18 '22 02:01 yangshun

I still like the idea of having "dependency independence". Apart from Webpack / React router / other low-level infra, we aren't coupled to any dependency. It means we can describe our architecture as an integral thing without saying "the peripheral is Docusaurus, but the core, well, is Next.js and it's a black box". Working on Docusaurus frankly made me a lot more familiar with how SSG works😄

Josh-Cena avatar Jan 18 '22 02:01 Josh-Cena

@yangshun @Josh-Cena we seem to all agree on this: the value proposition of Docusaurus is all about the plugins and opinionated docs features to get started very fast and still have great flexibility.

That was also my opinion on day one, but also think that having our own SSG wasn't totally useless: it permitted us to iterate faster without being blocked by limits of an existing dependency and gave us time to evaluate better Gatsby vs Next.js vs others (the choice wasn't so clear in 2019 😅 and Remix remains a new interesting option today)

We discussed this with @zpao and @JoelMarcey a few months ago and we agreed that Docusaurus should rather migrate to Next.js.

Or become framework-agnostic. This might be more complicated to document well, and harder to implement, but could allow using other solutions like Remix or Gatsby.

And building on top of Next.js also incentives Vercel to invest more in Docusaurus 🤷‍♂️ eventually we could join forces with Nextra if companies can agree on that


Now I don't think it is going to be in 3.0, because 3.0 will likely be quite soon if we start to respect Semver more strictly (see https://github.com/facebook/docusaurus/issues/6113)

slorber avatar Jan 19 '22 11:01 slorber

One thing I regret about migrating to Next.js is we will be forever tied to Webpack because from my observation the Webpack 5 migration for them was more painful than for us. Webpack ultimately is not comparable in terms of performance to, say, esbuild... 🤔

Josh-Cena avatar Jan 19 '22 12:01 Josh-Cena

Next.js starts to migrate on swc (rust), and replace webpack more and more, so you should not afraid it, as I written above it will be smoothly migration

alexander-akait avatar Jan 19 '22 12:01 alexander-akait

One thing I regret about migrating to Next.js is we will be forever tied to Webpack because from my observation the Webpack 5 migration for them was more painful than for us. Webpack ultimately is not comparable in terms of performance to, say, esbuild... 🤔

I agree that we want something fast but I believe it's also the goal of Next.js 😅

Their Webpack 5 migration is likely more complex because of the higher diversity of sites needing to migrate, compared to our low diversity: most doc sites are not customized that much and plugins don't always tweak Webpack settings.

Also, there's value in keeping at least some things in Webpack for now: our plugin ecosystem can remain retro-compatible

slorber avatar Jan 19 '22 12:01 slorber

Yeah, in the short term migrating to Next.js is surely going to yield lots of benefits. I'm never actually used it purely as an SSG but more as a React framework, but if we can figure out how to make them interoperate it will be very nice!

Josh-Cena avatar Jan 19 '22 12:01 Josh-Cena

the choice wasn't so clear in 2019

Yep totally true. Back then I referenced how Gatsby did lots of things and I might have just chosen Gatsby to build on top of actually.

Webpack ultimately is not comparable in terms of performance to, say, esbuild

The thing is, with Next.js' backing, they will just use the fastest that's out there and we can benefit from it by building on top of Next.js. I believe Sebastien is also saying the same. Hopefully we can go with Next.js in the next version (or even better if can be framework agnostic)!

yangshun avatar Jan 19 '22 13:01 yangshun

As an outsider we looked at all the options and docusaurus was the best when it came to level of investment and clean and clear plugin and theming architecture. I think competition in this space is much needed and I think having nextra and docusaurus is great for pushing the envelope.

I think the alternatives to webpack aren't somewhere stable enough to really compare apples to apples, I think by the next major version I think the landscape is going to look very different or maybe webpack migrates to rust and no one needs to do any major rearchitecting at all.

gabrielcsapo avatar Jan 19 '22 19:01 gabrielcsapo

Allowing alternative JS loaders may hinder the provision of useful OOTB JS syntax extensions. For example, @slorber mentioned somewhere that we may explore making <Translate> a zero-runtime API through Babel transformations. I also talked to him about solving #4530 through a runtime transformation of '@theme-original/*' to the actual path of "next component in the theme component stack", instead of using static Webpack aliases. I would definitely want to use SWC/esbuild, but in any case, it would mean writing the transform plugin with a different set of APIs, maybe even a different language. That makes it not scalable. If we have to insert an extra loader that uses Babel, then we are back in square one and perf will be compromised.

Josh-Cena avatar Feb 10 '22 13:02 Josh-Cena

Perf is arguably already compromised by using Babel in the first place.

RDIL avatar Feb 10 '22 13:02 RDIL

Perf is arguably already compromised by using Babel in the first place.

Yes, that's the whole point here. We either want to use Babel throughout, or drop Babel altogether. We don't want to do custom code transformations (the two I mentioned) through Babel, then delegate the rest of transpilation to another JS loader. But it would not be scalable if we support multiple JS loaders, especially if it's bring-your-own-parser.

Josh-Cena avatar Feb 10 '22 13:02 Josh-Cena

I think we should drop Babel personally.

Pros:

  • Better perf + memory usage
  • Smaller dependency tree, causing potentially faster install times

Cons:

  • Transforms may need to be ported to Rust, not 100% sure on that one though.
  • Would need to upgrade to MDX 2?

RDIL avatar Feb 10 '22 14:02 RDIL

Babel is still useful as a default, because the babel.config.js is documented as public API and users who want more JS syntax would often it easier to search for a Babel plugin. We can certainly promote SWC/esbuild by providing OOTB configurations, but ultimately it means we need to support multiple JS loaders.

Transforms may need to be ported to Rust, not 100% sure on that one though.

If you are talking about SWC—the current plugin system is still JS-based. https://swc.rs/docs/usage/plugins

Would need to upgrade to MDX 2?

Yeah, not a huge deal though, I think MDX 1 only does limited transformation with Babel

Josh-Cena avatar Feb 10 '22 14:02 Josh-Cena

Transforms may need to be ported to Rust, not 100% sure on that one though.

If we want i18n to work we'd rather port our plugin to Rust yes.

Although technically for now we only extract translations with the Babel plugin (no source change), so I think it may not even be necessary in the short term: we could keep extracting translations with Babel (slower but who cares) and only transpile in Rust.

If you are talking about SWC—the current plugin system is still JS-based. swc.rs/docs/usage/plugins

Afaik NAPI-RS has low overhead to call RS from JS but not the opposite. That's also probably why Vercel (recently hired NAPI-RS creator) is porting popular Babel plugins to Rust

slorber avatar Feb 10 '22 15:02 slorber

Although technically for now we only extract translations with the Babel plugin (no source change), so I think it may not even be necessary in the short term: we could keep extracting translations with Babel (slower but who cares) and only transpile in Rust.

Yeah, that's the idea. The extractor is only run in development so it's fine to be in Babel, but for build-time transformations (if we ever implement that) we'd rather use Rust.

Josh-Cena avatar Feb 10 '22 15:02 Josh-Cena

Hi, I am trying to use github actions to build my large site, currently github actions provide 6 hours and after that it timeouts. Is there any option to save the partial build of docusaurus site and resume the build again. So that i could complete the whole build in multiple github action runs.

Update: This should be possible by dockerizing your build process and then using checkpoint option, but due to huge ram usage( or swap), this may not be possible, because checkpointing will dump ram(or swap) and the github runner will have no hdd space left, though you can use this to increase space(& swap), but I don't think it will work for my usecase. Your mileage may vary.

fawazahmed0 avatar May 14 '22 10:05 fawazahmed0

We don't have a way to resume a build sorry.

We use Webpack which has caching layers (that we can persist) but afaik it can only persist at the end of a build and not incrementally. Unless someone shows how to persist Webpack cache before the end of a build, I assume it's not possible

slorber avatar May 25 '22 13:05 slorber

I've thought about something similar in the past, using some sort of "lazy build" that doesn't load everything at once into memory. I don't know if Webpack is able to do that.

Josh-Cena avatar May 25 '22 13:05 Josh-Cena

Maybe useful https://webpack.js.org/configuration/experiments/#experimentslazycompilation

alexander-akait avatar May 25 '22 13:05 alexander-akait

Ah yes! I did see that. Definitely worth looking into in the near future...

Josh-Cena avatar May 25 '22 13:05 Josh-Cena

@alexander-akait already tried that but wasn't able to make it work successfully so far 🤪

CleanShot 2022-05-25 at 19 55 26@2x

Also is this really supposed to improve a static site production build? Considering that in the end, everything must be compiled what's the point of using lazy compilation?

This Storybook benchmark with/without lazyCompilation also shows that the win for a production build is not significant: https://storybook.js.org/blog/storybook-performance-from-webpack-to-vite/

CleanShot 2022-05-25 at 19 54 00@2x

But it is useful for the dev env:

CleanShot 2022-05-25 at 19 55 13@2x

We definitively want the improvements of lazyCompilation, but this issue is more about the production build, so I'm not sure this option is very relevant here?

slorber avatar May 25 '22 17:05 slorber

@slorber I see, currenty my reccomendation is trying to switch from babel to swc (and swc minifier) too, it should help very well, also swc is more mature now

alexander-akait avatar May 25 '22 17:05 alexander-akait

If you get a terminate called after throwing an instance of 'std::bad_alloc' error during the build, you might also need to set vm.max_map_count to higher value. for example sysctl -w vm.max_map_count=655300

Reference

fawazahmed0 avatar May 28 '22 22:05 fawazahmed0

FYI the current canary and next release will include the suggestion from @adventure-yunfei

We now limit the concurrency when outputting static files at the end of the build process. This should overload less the system IOs and reduce memory footprint by default.

Note: this may not impact build time much, but mostly avoid a potential OOM at the end of your build.


There's a "secret" env variable in case you want to tune this concurrency setting: process.env.DOCUSAURUS_SSR_CONCURRENCY

https://github.com/facebook/docusaurus/pull/7547/files#diff-058c5cef3799e9df200345c9d5b769bbc27838aea32aa4f8345d09858d416781R96

For now, this is undocumented: please give us feedback on how impactful it is on your site, and eventually, we'll document it or add it as a first-class config setting.

slorber avatar Jun 02 '22 08:06 slorber

Just tried that secret env variable (on version 0.0.0-5101) and even setting it to 2 I get an OOM error on our website (I can link the repository if needed). Is there anything I can do to debug why the memory usage spikes up so high once the server is supposedly compiled?

vladfrangu avatar Jun 06 '22 15:06 vladfrangu

Just tried that secret env variable (on version 0.0.0-5101) and even setting it to 2 I get an OOM error on our website (I can link the repository if needed). Is there anything I can do to debug why the memory usage spikes up so high once the server is supposedly compiled?

@vladfrangu consuming memory, above the default available in Node.js (0.5gb), does not seem unexpected to me for a large site. Now if we can't reasonably build a large site with 2-10GB of memory, that seems way more problematic.

The only thing you can do is profile your build and report your findings, to know which step exactly is taking more memory than you expect. I can't really teach you how to do this through a GitHub issue, I'm not an expert in this either.

See @fawazahmed0 comment here: https://github.com/facebook/docusaurus/issues/4765#issuecomment-910164698

Having a curve + an idea of what Webpack is working on is helpful.

slorber avatar Jun 15 '22 15:06 slorber

@alexander-akait Do you have exaples of using SWC and SWC minifier instead of babel?

PrivatePuffin avatar Jul 16 '22 12:07 PrivatePuffin

@alexander-akait Do you have exaples of using SWC and SWC minifier instead of babel?

we are using SWC on the Docusaurus site now, so feel free to steal our config: https://github.com/facebook/docusaurus/pull/6944

slorber avatar Jul 20 '22 11:07 slorber