instiki
instiki copied to clipboard
Special characters in article names break on HTML export
Hi,
I attempted to do a HTML export of an Instiki instance I had set up, and noticed that special characters in URLs were giving problems.
Example:
The original article name would be bogus, name
. This would be URL-encoded into bogus%2C+name
. So far so good. However, while the HTML file is (correctly!) saved as bogus%2C+name.xhtml
, the URLs to that page on other pages are not further URL-encoded. This leads to the URL to said page on another page leading to bogus%2C+name.xhtml
, which is parsed as bogus, name.xhtml
which is of course not correct (as no such file exists). The fix for this would be to further encode the % into %25, thereby making the URL bogus%252C+name.xhtml
, which would be interpreted as bogus%2C+name.xhtml
, the correct filename.
A quick fix until this bug is solved (for others that are encountering the same problem) is to run the following on a directory of exported files:
find ./ -name "*.xhtml" | xargs sed -i.bak -r -e "s/%([0-9A-Z]{2})/%25\\1/g"
Note that this will also affect URL encoding that does not appear in a URL, but elsewhere on the page, so it's by no means perfect - but at least your links will work.
- Sven