mediawiki-to-gfm icon indicating copy to clipboard operation
mediawiki-to-gfm copied to clipboard

Allow to save raw content in empty page, when pandoc chokes on contents

Open olberger opened this issue 2 years ago • 4 comments

Whenever pandoc cannot convert the contents (which I'm experiencing right now with strange tables in the pages), there's no other solution than process the content manually.

It could be handy to have an option, which will save the original content in a "raw" source code bloc in an otherwise empty page, to allow the process to comple, letting the user manually fix the output, instead of having to fix the pages in the input.

Hope this makes sense.

olberger avatar Mar 16 '22 13:03 olberger

As a workaround, one may convert the "printable HTML" of a page to gfm with pandoc. It will most likely be better than nothing.

This requires to save individual pages manually, though (or through some script). For instance, saving URL like https://mymediawiki.example.com/index.php?title=OneBuggyPage&printable=yes

Then converting with pandoc --from html --to gfm OneBuggyPage.html >OneBuggyPage.md

Hope this helps,

olberger avatar Mar 16 '22 13:03 olberger

Do you have an example XML output that pandoc is having troubles with?

outofcontrol avatar Mar 23 '22 20:03 outofcontrol

I ran in to the following issue just now:

Error at "/tmp/pandoc62bb8cfbe524c" (line 103, column 1):
unexpected end of input
Pandoc\PandocException: Pandoc could not convert successfully, error code: 65. Tried to run the following command: /path/to/file/opt/bin/pandoc --from=mediawiki --to=gfm /tmp/pandoc62bb8cfbe524c in /path/to/file/tmp/mediawiki-to-gfm/vendor/ryakad/pandoc-php/src/Pandoc/Pandoc.php:287
Stack trace:
#0 /path/to/file/tmp/mediawiki-to-gfm/app/src/Convert.php(194): Pandoc\Pandoc->runWith()
#1 /path/to/file/tmp/mediawiki-to-gfm/app/src/Convert.php(149): App\Convert->runPandoc()
#2 /path/to/file/tmp/mediawiki-to-gfm/app/src/Convert.php(117): App\Convert->convertData()
#3 /path/to/file/tmp/mediawiki-to-gfm/convert.php(50): App\Convert->run()
#4 {main}

Here's the XML that I was trying to convert (note: I changed the file extension to .txt since Github doesn't allow uploading XML)

Quicksilver+Wiki-20220628231653.txt

pjrobertson avatar Jun 28 '22 23:06 pjrobertson

I have had similar problems. In my case, pandoc choked on un-closed <pre> tags. with the following error message:

Error at "/var/folders/8s/7jr95f1d28vflnwm1cfqv7t40000gr/T/pandoc62f680268b595" (line 195, column 1):
unexpected end of input
Pandoc\PandocException: Pandoc could not convert successfully, error code: 65. Tried to run the following command: /opt/homebrew/bin/pandoc --from=mediawiki --to=gfm /var/folders/8s/7jr95f1d28vflnwm1cfqv7t40000gr/T/pandoc62f680268b595 in /Users/redacted/Documents/workspace/mediawiki-to-gfm/vendor/ryakad/pandoc-php/src/Pandoc/Pandoc.php:287
Stack trace:
#0 /Users/redacted/Documents/workspace/mediawiki-to-gfm/app/src/Convert.php(194): Pandoc\Pandoc->runWith('This is to have...', Array)
#1 /Users/redacted/Documents/workspace/mediawiki-to-gfm/app/src/Convert.php(149): App\Convert->runPandoc('This is to have...')
#2 /Users/redacted/Documents/workspace/mediawiki-to-gfm/app/src/Convert.php(117): App\Convert->convertData()
#3 /Users/redacted/Documents/workspace/mediawiki-to-gfm/convert.php(50): App\Convert->run()
#4 {main}

here is a minimal example:

  <page>
    <title>Minimal example for un-closed pre tags</title>
    <ns>0</ns>
    <id>2</id>
    <revision>
      <id>1848</id>
      <parentid>1847</parentid>
      <timestamp>2016-04-25T12:54:43Z</timestamp>
      <contributor>
        <username>Redacted</username>
        <id>0</id>
      </contributor>
      <model>wikitext</model>
      <format>text/x-wiki</format>
      <text xml:space="preserve" bytes="1498">
This un-closed pre tag will cause pandoc to error out with a "unexpected end of input" error:

&lt;pre&gt;
some text here</text>
      <sha1>rzi966b2q3ngsa07mlfyxiap5bruxgj</sha1>
    </revision>
  </page>

adding &lt;/pre&gt; before the closing tag fixed the issue for me. But I had to manually figure out what the problem was, and for that it would be very helpful to get a file with all the problematic pages that could not be translated automatically. Also, this would have the added benefit, that the conversion does not choke on a single defective page.

thawn avatar Aug 13 '22 10:08 thawn

Here is a similar issue I've encountered (as always, the problem is with pandoc, not with mediawiki-to-gfm:

== Account ==
{|  border=&quot;1&quot; cellpadding=&quot;5&quot; cellspacing=&quot;0&quot; 
!  width=&quot;200&quot; | &amp;nbsp;
!  width=&quot;200&quot; | Standalone
!  width=&quot;200&quot; | Grid
|- 
| login
|  bgcolor=&quot;lime&quot; | OK
|  bgcolor=&quot;lime&quot; | OK
|- 
| logout
|  bgcolor=&quot;lime&quot; | OK
|  bgcolor=&quot;lime&quot; | OK
|- 
| relog
|  bgcolor=&quot;lime&quot; | OK
|  bgcolor=&quot;lime&quot; | OK
[...]

This throws the following exception:

Error at "/var/tmp//pandoc63b85e78d2193" (line 368, column 4):
unexpected 'b'
{| border = "1"
   ^
Pandoc\PandocException: Pandoc could not convert successfully, error code: 65. Tried to run the following command: /usr/local/bin/pandoc --from=mediawiki --to=gfm /var/tmp//pandoc63b85e78d2193 in /Users/gwyneth/Developer/mediawiki-to-gfm/vendor/ryakad/pandoc-php/src/Pandoc/Pandoc.php:287
Stack trace:
#0 /Users/gwyneth/Developer/mediawiki-to-gfm/app/src/Convert.php(194): Pandoc\Pandoc->runWith('{{Quicklinks}}\n...', Array)
#1 /Users/gwyneth/Developer/mediawiki-to-gfm/app/src/Convert.php(149): App\Convert->runPandoc('{{Quicklinks}}\n...')
#2 /Users/gwyneth/Developer/mediawiki-to-gfm/app/src/Convert.php(117): App\Convert->convertData()
#3 /Users/gwyneth/Developer/mediawiki-to-gfm/convert.php(50): App\Convert->run()
#4 {main}

It's obviously pandoc struggling with a table...

GwynethLlewelyn avatar Jan 06 '23 18:01 GwynethLlewelyn

By any chance, have you tried manually converting directly with pandoc? Asking in case a new version would resolve this issue.

outofcontrol avatar Jan 06 '23 18:01 outofcontrol

Hm, in my case, actually, no, I didn't try a manual conversion — BTW, I'm using pandoc 2.19.2, the latest that is installable via Homebrew.

GwynethLlewelyn avatar Jan 13 '23 00:01 GwynethLlewelyn

This issue appears to be with Pandoc and out of scope for this repository. Please feel free to create a new issue if needed.

outofcontrol avatar Mar 07 '23 02:03 outofcontrol