wikiteam icon indicating copy to clipboard operation
wikiteam copied to clipboard

PageMissingError during resume attempts

Open jmtd opened this issue 4 years ago • 2 comments

I'm dumping doomwiki.org using master = HEAD = 9b1996d4368c0b7ccd568dd2c9b460159b8e26e5

The initial dump eventually fails for some reason or other due to the remote end throwing a HTTP error 500. The initial dump command was

python dumpgenerator.py https://doomwiki.org --xml --images --path wiki-doomwiki.org-20200629

(python 2.7.16 FWIW)

Next I try a resume

python dumpgenerator.py https://doomwiki.org --xml --images --path wiki-doomwiki.org-20200629 --resume

This reliably fails for me as follows

…
        1 more revisions exported
        2 more revisions exported
'*'
Traceback (most recent call last):
  File "dumpgenerator.py", line 2528, in <module>
    main()
  File "dumpgenerator.py", line 2518, in main
    resumePreviousDump(config=config, other=other)
  File "dumpgenerator.py", line 2165, in resumePreviousDump
    session=other['session'])
  File "dumpgenerator.py", line 727, in generateXMLDump
    for xml in getXMLRevisions(config=config, session=session, start=start):
  File "dumpgenerator.py", line 829, in getXMLRevisions
    yield makeXmlFromPage(page)
  File "dumpgenerator.py", line 1083, in makeXmlFromPage
    raise PageMissingError(page['title'], e)
__main__.PageMissingError: page 'Entryway' not found

"Entryway" is doomwiki's front page.

I've dumped this (and other small) mediawikis a few times over the years w/o error, but with a much older version of the tooling (23efbef)

jmtd avatar Jul 03 '20 09:07 jmtd

I can reproduce with

python2 dumpgenerator.py --api=https://doomwiki.org/w/api.php --xml --xmlrevisions

but works for me without --xmlrevisions (which is not enabled by default for a reason).

Or at least it worked for me until the point:

Hissy, 57 edits

Downloaded 4280 pages Hit point, 24 edits Hit points, 1 edit Hitscan, 22 edits

(will post the dump later)

$ python2 dumpgenerator.py --api=https://doomwiki.org/w/api.php --xml --xmlrevisions Checking API... https://doomwiki.org/w/api.php API is OK: https://doomwiki.org/w/api.php Checking index.php... https://doomwiki.org/w/index.php index.php is OK #########################################################################

Welcome to DumpGenerator 0.4.0-alpha by WikiTeam (GPL v3)

More info at: https://github.com/WikiTeam/wikiteam

#########################################################################

#########################################################################

Copyright (C) 2011-2020 WikiTeam developers

This program is free software: you can redistribute it and/or modify

it under the terms of the GNU General Public License as published by

the Free Software Foundation, either version 3 of the License, or

(at your option) any later version.

This program is distributed in the hope that it will be useful,

but WITHOUT ANY WARRANTY; without even the implied warranty of

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

GNU General Public License for more details.

You should have received a copy of the GNU General Public License

along with this program. If not, see http://www.gnu.org/licenses/.

#########################################################################

Analysing https://doomwiki.org/w/api.php Trying generating a new dump into a new directory... Loading page titles from namespaces = all Excluding titles from namespaces = None 20 namespaces found Retrieving titles in the namespace 0 14377 titles retrieved in the namespace 0 Retrieving titles in the namespace 1 1447 titles retrieved in the namespace 1 Retrieving titles in the namespace 2 488 titles retrieved in the namespace 2 Retrieving titles in the namespace 3 507 titles retrieved in the namespace 3 Retrieving titles in the namespace 4 70 titles retrieved in the namespace 4 Retrieving titles in the namespace 5 44 titles retrieved in the namespace 5 Retrieving titles in the namespace 6 17746 titles retrieved in the namespace 6 Retrieving titles in the namespace 7 293 titles retrieved in the namespace 7 Retrieving titles in the namespace 8 262 titles retrieved in the namespace 8 Retrieving titles in the namespace 9 11 titles retrieved in the namespace 9 Retrieving titles in the namespace 10 1353 titles retrieved in the namespace 10 Retrieving titles in the namespace 11 159 titles retrieved in the namespace 11 Retrieving titles in the namespace 12 10 titles retrieved in the namespace 12 Retrieving titles in the namespace 13 3 titles retrieved in the namespace 13 Retrieving titles in the namespace 14 2083 titles retrieved in the namespace 14 Retrieving titles in the namespace 15 70 titles retrieved in the namespace 15 Retrieving titles in the namespace 2300 0 titles retrieved in the namespace 2300 Retrieving titles in the namespace 2301 0 titles retrieved in the namespace 2301 Retrieving titles in the namespace 2302 0 titles retrieved in the namespace 2302 Retrieving titles in the namespace 2303 0 titles retrieved in the namespace 2303 Titles saved at... doomwikiorg_w-20200703-titles.txt 38923 page titles loaded https://doomwiki.org/w/api.php Getting the XML header from the API Retrieving the XML for every page from the beginning 20 namespaces found Trying to export all revisions from namespace 0 Trying to get wikitext from the allrevisions API and to build the XML ... 2 more revisions exported '*' Traceback (most recent call last): File "dumpgenerator.py", line 2528, in main() File "dumpgenerator.py", line 2520, in main createNewDump(config=config, other=other) File "dumpgenerator.py", line 2087, in createNewDump generateXMLDump(config=config, titles=titles, session=other['session']) File "dumpgenerator.py", line 727, in generateXMLDump for xml in getXMLRevisions(config=config, session=session, start=start): File "dumpgenerator.py", line 829, in getXMLRevisions yield makeXmlFromPage(page) File "dumpgenerator.py", line 1083, in makeXmlFromPage raise PageMissingError(page['title'], e) main.PageMissingError: page 'Entryway' not found

nemobis avatar Jul 03 '20 11:07 nemobis

I'm suddenly getting the same on both Linux and Windows (Conda 2.7), and BOTH on resumes and new loads. On a page where exports worked two days ago. And the page does exist.

cooperdk avatar Jun 12 '22 00:06 cooperdk