docusaurus
docusaurus copied to clipboard
Option to generate offline static HTML files usable without server
π Feature
docusaurus build
will build a local production version that has to be docusaurus serve
'd to be usable. Can we add an option to build an offline static HTML files that are usable completely without any server, so user can just open index.html with a browser to read the whole documentation.
It's about calculating relative (instead of absolute) URLs, and appending "index.html" at the end of the URLs. Algolia search will have to be removed, any online cloud assets will have to be put in local folders.
Have you read the Contributing Guidelines on issues?
Yes
Comment, Motivation, Pitch
What about other static site generators and libraries?
Gatsby, React, etc.'s build
all do the similar thing, they all need a server.
Gatsby has this feature request for option to build such offline static HTML site: gatsbyjs/gatsby#4610, which is closed without the issue being solved. Users keep asking for the feature and for reopening the issue. According to one comment, in Gatsby v1 it actually can generate such static site, it is in v2 it doesn't work.
React serves general purpose and Gatsby is made for any website. But Docusaurus, is primarily made for documentation, it may need the feature of the offline version generation more than React and Gatsby do.
PDF and ebook formats
There is already a feature request, #969, that asks for option to create an offline version in PDF format. It is obviously brilliant to be able to make PDF and maybe also EPUB, MOBI, AZW. PDF and these ebook formats may have less security concern than HTML. But the downsides are, it may be a little time-consuming to achieve the PDF feature; those interactive navs and TOCs and colorful website design and layout will have to be removed in PDF and other ebook formats. Offline static HTML is easier to make. If PDF feature is in the long-term plan, then Offline static HTML could be in a shorter-term to-do list.
Compressed web file format
The offline static web files usable without server, could be simply compressed as a zip or in other common archive formats. User will need to uncompress the file and click index.html in the root folder to use it.
They can also be compiled in CHM (Microsoft Compiled HTML Help), problem is it is a bit old and it does not have native support in non-Windows OS. It's a little surprising there's no standard or universally accepted file format similar to CHM. Perhaps it's due to security concerns.
You can use electron to freeze it I believe
You can use electron to freeze it I believe
That'd be a super overkill. Even you have just one webpage, Electron will make it 80-100 MB, putting the whole browser rendering and scripting engines in it.
+1
+1
@ohkimur mentionned he built a postprocessing step to enable support of local browsing, using the file://
protocol.
https://github.com/facebook/docusaurus/issues/448#issuecomment-908777029
That doesn't look like a bad idea to build this as a postprocessing step.
A Docusaurus plugin could be built in userland to solve this problem. Plugins have a postBuild
lifecycle to use for that.
Note: such plugin should take into account the config.trailingSlash
option because output files are not always /path/index.html
anymore, and can also be /path.html
Note: for some Docusaurus features (particularly SEO metas such as social image, i18n hreflang...), URLs in the HTML files MUST be fully qualified absolute URLs (domain + absolute path).
Building a site for local offline usage does not prevent you from setting site URL and baseUrl in the config file, otherwise, the build output would not be suitable for online hosting.
For these reasons, it's very unlikely we'll add support for using "relative baseUrl" in Docusaurus, such as baseUrl: ''
: it would lead to an output that would only be correct for local usage, it's likely users will deploy sites with broken metadata online without noticing the SEO problems.
Moving my conversation from #448 to this thread
@ohkimur - your suggestion works for the most part of it, but Webpack configurations are still being difficult to resolve
@slorber - my use case isn't for offline usage. I am trying to put together a simplistic developer workflow which involves publishing documentation to GitHub pages. At my workplace, we are using GitHub enterprise.
The use case is as follows,
- Developer forks a repository and works on it
- In the PR they are also contributing the generated site
- Before merging the generated site, I would like to view the site on their fork
Given that baseUrl
needs to be defined as a fixed path (e.g., /pages/_GH_ORG_/_GH_REPO_NAME_/
), it becomes difficult to view the incoming changes till it has been merged.
I understand that this is not how most people work. This is part of an exercise where I am trying to encourage my team to get into the habit of documenting their software. The pandemic has made things worse because reviewing the UI / UX now requires a meeting instead of being able to just view the documentation against their repositories.
Any ideas that you might have to improve this workflow / process are most welcome.
I'm more of a Java/JVM guy .. which isn't helping and making the hacking process that much more challenging. Any help is greatly appreciated.
@slorber I created the docusaurus-plugin-relative-paths to solve the issue. I used the same post-processing approach using Docusaurus postBuild
lifecycle. π¦π
@roguexz , if you used a modern Jamstack tool like Netlify or Vercel (both much better than GH pages), you'd get a much better experience and all PRs would have a "deploy preview" link that includes the changes from the PR, ensuring they are valid (docusaurus can build and you can check the end result a merge would lead to before doing that merge).
See this Docusaurus PR, the Netlify bot added a link so that the PR can be reviewed more easily: https://github.com/facebook/docusaurus/pull/5462#issuecomment-910481735
This is very easy to set up.
@ohkimur thanks! hope people will like this solution.
One interesting idea could be to have 2 mods:
- modify
build
files - keep
build
unchanged, create a copy ofbuild
, modify it and generate abuild/site.zip
archive that users could download?
@roguexz , if you used a modern Jamstack tool like Netlify or Vercel (both much better than GH pages), you'd get a much better experience and all PRs would have a "deploy preview" link that includes the changes from the PR, ensuring they are valid (docusaurus can build and you can check the end result a merge would lead to before doing that merge).
See this Docusaurus PR, the Netlify bot added a link so that the PR can be reviewed more easily: #5462 (comment)
This is very easy to set up.
@ohkimur thanks! hope people will like this solution.
One interesting idea could be to have 2 mods:
- modify
build
files- keep
build
unchanged, create a copy ofbuild
, modify it and generate abuild/site.zip
archive that users could download?
@slorber I think this is a great idea. If you want, you can open a issue here and I will work on it. π±βπ€
Picking up on @RDIL's comment: I added the build output files to an electron app and encountered a few issues. After specifying each of the files in package.json and with docusaurus-plugin-relative-paths
(thank you, @ohkimur!), the HTML content is rendered fine with all images, but electron is still looking for scripts in file:///assets/
based on a reference in runtime~main.xxx.js. Any idea how this could be fixed?
A very rough way to fix script references is baseUrl: './'
. However, this also messes with routes, so a somewhat more correct approach is to change only o.p=
in the compiled runtime~main.xxx.js (not sure if there's a more elegant way, but unless there is, one idea might be to make this part of the docusaurus-plugin-relative-paths
postprocess script). There are also references in main.xxx.js that point to an absolute directory as well. Now most scripts load, but they re-render all pages as the 404 page / NotFound
component. Of course, getting rid of parts.push('exact: true');
in @docusaurus/core/lib/server/routes.js doesn't exactly fix the problem, since sub-routes won't load. Why does it have to check the route match, is that just for the prefetching? It seems odd that content is switched to NotFound once scripts load, since the static content looks fine and everything is in the right place while scripts fail to load.
Also, not sure if this is documented anywhere but to dev with npm run start, I had to deactivate docusaurus-plugin-relative-paths
plugin.
Docusaurus creates quite a few js files to keep track of if you work in an environment that requires you to list every single file. I'm used to react-static, and its builds consist of far fewer files.
@larissa-n Thank you for your observations. I know about the issue you mentioned, but I didn't fix it since I didn't find an elegant approach to do it. If you already have a potential solution (even though it's messy) I invite you to make a pull request in the plugin's repo. I can extend it later if necessary.
Also, can you describe the problem you had when you tried npm start
? Isn't the plugin called only when a build is triggered? If not, then this is a bug and it might be a good idea to fix it.
Note: for some Docusaurus features (particularly SEO metas such as social image, i18n hreflang...), URLs in the HTML files MUST be fully qualified absolute URLs (domain + absolute path).
Why must they be fully qualified, @slorber?
I've just started to use docusaurus and I find the baseUrl
rather limiting, because I kind of expected a copy-the-html-files-anywhere exeprience for deployment anywhere, without extra configuration. I don't understand why there's a tight coupling to baseUrl
. Your comment hints at why it exists.
Is this necessity documented?
@sigwinch28 Yes, see https://docusaurus.io/docs/advanced/routing#routes-become-html-files
Note: for some Docusaurus features (particularly SEO metas such as social image, i18n hreflang...), URLs in the HTML files MUST be fully qualified absolute URLs (domain + absolute path).
Why must they be fully qualified, @slorber?
I've just started to use docusaurus and I find the
baseUrl
rather limiting, because I kind of expected a copy-the-html-files-anywhere exeprience for deployment anywhere, without extra configuration. I don't understand why there's a tight coupling tobaseUrl
. Your comment hints at why it exists.Is this necessity documented?
It's not just coupling to a /baseUrl/
, it is coupling to your domain as well.
There are multiple things in Docusauurs relying on that, in particular SEO metadata like canonical URL
<link data-rh="true" rel="canonical" href="https://docusaurus.io/docs/myDoc">
What Google says: https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls

Although relative URLs seem supported (maybe only by Google?), it's not recommended.
Similarly, meta hreflang headers for i18n sites:

https://developers.google.com/search/docs/advanced/crawling/localized-versions

(including the transport method means you also can't switch from HTTP to HTTPS without a Docusaurus config change)
Similarly for og:image
metadata responsible for providing the social card preview on your site on social network
<meta property="og:image" content="https://docusaurus.io/img/socialcard.png"/>
Using a relative URL can lead to failures to display the card and does not respect the spec: https://ogp.me/#data_types

It's not a Docusaurus-side constraint, it's a constraint that comes from outside.
You really have to build your site for a specific protocol/domain/baseUrl.
Now I understand in some cases you don't care about the features above and prefer to have more "deployment flexibility", but for now we don't support that.
Fantastic answers. Thank you.
@ohkimur it looks like your completely deleted your docusaurus-plugin-relative-paths project? what happened?
@jeacott1 Yeah. I did. I want to invest my time into something different.
@ohkimur I appreciate your OSS work!
Could you please put your docusaurus repos up as 'Archived'? Even temporarily? Maybe even email me? π
(I'm trying to help a former student make sense of some README notes left by a previous dev. It has permalinks to your docusaurus-plugin-relative-paths
, and all I see is 404s. πAnd the mystery deepens...)
If you'd rather not deal with it at all, I do understand. I hope your new focus is rewarding.
Best wishes, Dan.
@justsml - I've been using the script here which for my purposes works well. https://github.com/facebook/docusaurus/issues/448#issuecomment-908777029 I wish there was a pure js search that worked with this mode though.
Adding a use-case note here. Our users may have a critical need to use our products when their internet is down, this includes access to the documentation as the products are complex. Being able to distribute the most recent version of the docs with the applications as they're built would be a huge win and allow us to migrate all content to Docusaurus.
@dtlhlbs Have you considered https://docusaurus.io/docs/api/plugins/@docusaurus/plugin-pwa ?
@Josh-Cena I see that and will implement it, but it's not much help if users are already offline and haven't set up the PWA.
@justsml - I've been using the script here which for my purposes works well. #448 (comment) I wish there was a pure js search that worked with this mode though.
Tabs will not work since some js error.
So from what I hear, there's no way to do this yet?
Rylly
This should be a killer feature. Waiting for it so long...
@prtnYuvalJ yeah makes no sense to me. It's not like online hosted sites cant use relative urls! and the site is generated, so surely you can choose your target if you desperately want absolute urIs!
@prtnYuvalJ "using relative URLs for a hosted side breaks SEO and metadata" I accept perhaps there's some merit to that but I kinda doubt google cares. regardless, it's no reason not to support generating relative versions for those who don't care or who want the facility.
fwiw - take this with as many grains of salt as you like. I asked chatgpt the question
Q: does using relative URLs for a hosted website break SEO and metadata?
ChatGPT Using relative URLs on a hosted website typically does not break SEO (Search Engine Optimization) or metadata when implemented correctly. In fact, relative URLs can offer certain advantages in website development, such as making the website more portable and easier to manage, especially when migrating the site to a different domain or when switching between development and production environments.