instiki icon indicating copy to clipboard operation
instiki copied to clipboard

Special characters in article names break on HTML export

Open joepie91 opened this issue 12 years ago • 0 comments

Hi,

I attempted to do a HTML export of an Instiki instance I had set up, and noticed that special characters in URLs were giving problems.

Example:

The original article name would be bogus, name. This would be URL-encoded into bogus%2C+name. So far so good. However, while the HTML file is (correctly!) saved as bogus%2C+name.xhtml, the URLs to that page on other pages are not further URL-encoded. This leads to the URL to said page on another page leading to bogus%2C+name.xhtml, which is parsed as bogus, name.xhtml which is of course not correct (as no such file exists). The fix for this would be to further encode the % into %25, thereby making the URL bogus%252C+name.xhtml, which would be interpreted as bogus%2C+name.xhtml, the correct filename.

A quick fix until this bug is solved (for others that are encountering the same problem) is to run the following on a directory of exported files:

find ./ -name "*.xhtml" | xargs sed -i.bak -r -e "s/%([0-9A-Z]{2})/%25\\1/g"

Note that this will also affect URL encoding that does not appear in a URL, but elsewhere on the page, so it's by no means perfect - but at least your links will work.

  • Sven

joepie91 avatar Oct 30 '12 12:10 joepie91