File / Folder Names that contain parentheses do not resolve
🐛 Bug Report
Prerequisites
- [x] I'm using the latest version of Docusaurus.
- [X] I have tried the
npm run clearoryarn clearcommand. - [X] I have tried
rm -rf node_modules yarn.lock package-lock.jsonand re-installing packages. - [X] I have tried creating a repro with https://new.docusaurus.io
- [X] I have read the console error message carefully (if applicable)
Description
Parentheses are not allowed in a file / folder name. When a file has parentheses the file still shows up in the sidebar but when clicked shows "page not found".
Steps to reproduce
- Create a file or folder that has
(or)in the name. - Go to that file in the sidebar.
CodeSandbox doesn't allow the special character... but stackblitz does. Demo:
https://stackblitz.com/edit/github-eranrw?file=docs/another(1).md
Expected behavior
I would expect it to be able to resolve the special character.
Actual behavior
The page did not resolve and seems to be considered a broken markdown link.
agree there is something wrong and it should work
We use React Router, which use https://github.com/pillarjs/path-to-regexp
The slugification process that transforms filenames to uris/routes should probably remove or escape parentheses.
I'll have to check but is /docs/another(1) a valid path segment?
At what url do you expect this doc to be served?
I don't believe there's anything wrong with having /docs/another(1) as the path segment. Azure DevOps Wiki seems to be using it that way. Do you have other cases of urls being changed from their original path and if so, how do you map those back to the repo with the edit this page button?
I seem to get inconsistent results between dev & production builds for this, with spaces and .'s in the names - not sure if it's related.
Hi, if this is an issue I could work on then I would gladly give it a shot. I'm still pretty new to contributing so some hints on where to start and what I can/should do would be very much appreciated. Thanks
Indeed, it seems to be because (paren) is perceived as a regexp by React router... I think we should just remove that paren when computing slug? @lukejgaskell is having parentheses important for you?
Hey @Josh-Cena, I don't have the need currently, but what I will say is it would be a nice fix for porting docs over from other locations. A lot of people end up using parentheses in their file names and would be nice if this tool was able to handle that.
Yes, absolutely. I think we will just remove parentheses from the slug automatically. Sounds good?
@Josh-Cena hey, sorry for the slow reply. I would say with the question of removing them... would the link at the bottom of the page,"edit this page", still include them? Because if not that would make it hard to link back to the source.
would the link at the bottom of the page,"edit this page", still include them?
I suppose yes, because the edit URLs are generated through file paths, not URL paths.
@Josh-Cena sounds like it would work! Although you might have an issue with url collisions if the only difference between file names are the special characters.
@Josh-Cena / @lukejgaskell is there a timeline for fixing this? This is currently blocking us from incorporating some docs in our docusaurus site which are auto-generated by another markdown-generating tool.
@madelson External Markdown-generating tools are likely to break Docusaurus in one way or another (e.g. HTML tags). Some kind of postprocessing is almost always necessary, and removing characters from the file path is the easiest of all.
@Josh-Cena it's a bit trickier than that since I'd also have to clean up all the cross-links between the generated pages but I agree that it is doable. Just curious, is there a reason why is is desirable for Docusaurus not to support such URLs if they are otherwise valid?
Uh, it's not our fault, strictly speaking. It's because React-router processes them as regexps instead of literal characters. If you look at https://github.com/facebook/docusaurus/pull/6510 you see we want to align our behavior with other site generators, but so far I haven't got time to look into this. If you'd like to collect that information for us we'd greatly appreciate it.
It's because React-router processes them as regexps instead of literal characters.
If this is the source of the issue and (I presume) we want them to be treated as literals, would the fix be as simple as escaping (rather than replacing) all regexp special characters (e.g. with something like https://stackoverflow.com/questions/3446170/escape-string-for-use-in-javascript-regex)?
If you'd like to collect that information for us
To be clear, you're looking for information on what characters are supported in the URLs of other site generators like Jekyll? Or markdown-based ones specifically?
what characters are supported in the URLs of other site generators like Jekyll? Or markdown-based ones specifically?
See https://github.com/facebook/docusaurus/pull/6510#issuecomment-1028030028. Yep—site generators that have file-based routing, like Next.js or Remix. I'm curious if they (a) make ( appear literally in the slug (b) remove it (c) turn it into - or _. (I know Remix treats ( as literal characters)
would the fix be as simple as escaping (rather than replacing) all regexp special characters
Yes—if we decided they should be literal characters (to align with the behavior of other site generators). However, only ( is the "peculiar" one, because ?, [, and other stuff would already be encoded in URLs.