pelican-plugins AsciiDoc Reader with russian text

AsciiDoc Reader with russian text

Open gmaFFFFF opened this issue 4 years ago • 1 comments

Hi,

In addition to asciidoc and asciidoctor, there is also the asciidoctorj tool. I suggest adding it as a default tool
Faced problems while processing Russian texts | UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte. The solution requires several changes in the code and breaks fix. I got such a patch
Russian metadata not supported

    diff --git a/asciidoc_reader/asciidoc_reader.py b/asciidoc_reader/asciidoc_reader.py
    index 881a857..789253d 100644
    --- a/asciidoc_reader/asciidoc_reader.py
    +++ b/asciidoc_reader/asciidoc_reader.py
    @@ -29,11 +29,12 @@ def fix_unicode(val):
         if sys.version_info < (3,0):
             val = unicode(val.decode("utf-8"))
         else:
    -        # This fixes an issue with character substitutions, e.g. '<F1>' to '<C3><B1>'.
    -        val = str.encode(val, "latin-1").decode("utf-8")
    +        # This fixes an issue with character substitutions, e.g. '<EF><BF><BD>' to '<C3><B1>'.
    +        # val = str.encode(val, "latin-1").decode("utf-8")
    +        ...
         return val
    
    -ALLOWED_CMDS = ["asciidoc", "asciidoctor"]
    +ALLOWED_CMDS = ["asciidoc", "asciidoctor", "asciidoctorj"]
    
     ENABLED = None != default()
    
    @@ -51,7 +52,7 @@ class AsciiDocReader(BaseReader):
             if cmd:
                 optlist = self.settings.get('ASCIIDOC_OPTIONS', []) + self.default_options
                 options = " ".join(optlist)
    -            content = call("%s %s -o - %s" % (cmd, options, source_path))
    +            #content = call("%s %s -o - %s" % (cmd, options, source_path))
                 # Beware! # Don't use tempfile.NamedTemporaryFile under Windows: https://bugs.python.org/issue14243
                 # Also, use mkstemp correctly (Linux and Windows): https://www.logilab.org/blogentry/17873
                 fd, temp_name = tempfile.mkstemp()
    @@ -74,7 +75,7 @@ class AsciiDocReader(BaseReader):
             """Parses the AsciiDoc file at the given `source_path` and returns found
             metadata."""
             metadata = {}
    -        with open(source_path) as fi:
    +        with open(source_path, encoding="utf-8") as fi:
                 prev = ""
                 for line in fi.readlines():
                     # Parse for doc title.
    @@ -88,7 +89,7 @@ class AsciiDocReader(BaseReader):
                             metadata['title'] = self.process_metadata('title', fix_unicode(title))
    
                     # Parse for other metadata.
    -                regexp = re.compile(r"^:[A-z]+:\s*[A-z0-9]")
    +                regexp = re.compile(r"^:[A-z]+:\s*\S")
                     if regexp.search(line):
                         toks = line.split(":", 2)
                         key = toks[1].strip().lower()

Jun 25 '20 13:06 gmaFFFFF

Hello @gmaFFFFF, looks like the 2 and 3 your suggestions done in #1310. Try to pull latest master and check it. Any feedback is appreciated.

Dec 13 '20 11:12 podsvirov

pelican-plugins pelican-plugins copied to clipboard

AsciiDoc Reader with russian text

pelican-plugins
pelican-plugins copied to clipboard