pelican-plugins
pelican-plugins copied to clipboard
AsciiDoc Reader with russian text
Hi,
- In addition to asciidoc and asciidoctor, there is also the asciidoctorj tool. I suggest adding it as a default tool
- Faced problems while processing Russian texts
| UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
. The solution requires several changes in the code and breaks fix. I got such a patch - Russian metadata not supported
diff --git a/asciidoc_reader/asciidoc_reader.py b/asciidoc_reader/asciidoc_reader.py
index 881a857..789253d 100644
--- a/asciidoc_reader/asciidoc_reader.py
+++ b/asciidoc_reader/asciidoc_reader.py
@@ -29,11 +29,12 @@ def fix_unicode(val):
if sys.version_info < (3,0):
val = unicode(val.decode("utf-8"))
else:
- # This fixes an issue with character substitutions, e.g. '<F1>' to '<C3><B1>'.
- val = str.encode(val, "latin-1").decode("utf-8")
+ # This fixes an issue with character substitutions, e.g. '<EF><BF><BD>' to '<C3><B1>'.
+ # val = str.encode(val, "latin-1").decode("utf-8")
+ ...
return val
-ALLOWED_CMDS = ["asciidoc", "asciidoctor"]
+ALLOWED_CMDS = ["asciidoc", "asciidoctor", "asciidoctorj"]
ENABLED = None != default()
@@ -51,7 +52,7 @@ class AsciiDocReader(BaseReader):
if cmd:
optlist = self.settings.get('ASCIIDOC_OPTIONS', []) + self.default_options
options = " ".join(optlist)
- content = call("%s %s -o - %s" % (cmd, options, source_path))
+ #content = call("%s %s -o - %s" % (cmd, options, source_path))
# Beware! # Don't use tempfile.NamedTemporaryFile under Windows: https://bugs.python.org/issue14243
# Also, use mkstemp correctly (Linux and Windows): https://www.logilab.org/blogentry/17873
fd, temp_name = tempfile.mkstemp()
@@ -74,7 +75,7 @@ class AsciiDocReader(BaseReader):
"""Parses the AsciiDoc file at the given `source_path` and returns found
metadata."""
metadata = {}
- with open(source_path) as fi:
+ with open(source_path, encoding="utf-8") as fi:
prev = ""
for line in fi.readlines():
# Parse for doc title.
@@ -88,7 +89,7 @@ class AsciiDocReader(BaseReader):
metadata['title'] = self.process_metadata('title', fix_unicode(title))
# Parse for other metadata.
- regexp = re.compile(r"^:[A-z]+:\s*[A-z0-9]")
+ regexp = re.compile(r"^:[A-z]+:\s*\S")
if regexp.search(line):
toks = line.split(":", 2)
key = toks[1].strip().lower()
Hello @gmaFFFFF, looks like the 2 and 3 your suggestions done in #1310. Try to pull latest master and check it. Any feedback is appreciated.