wikiteam
wikiteam copied to clipboard
PageMissingError during resume attempts
I'm dumping doomwiki.org using master = HEAD = 9b1996d4368c0b7ccd568dd2c9b460159b8e26e5
The initial dump eventually fails for some reason or other due to the remote end throwing a HTTP error 500. The initial dump command was
python dumpgenerator.py https://doomwiki.org --xml --images --path wiki-doomwiki.org-20200629
(python 2.7.16 FWIW)
Next I try a resume
python dumpgenerator.py https://doomwiki.org --xml --images --path wiki-doomwiki.org-20200629 --resume
This reliably fails for me as follows
…
1 more revisions exported
2 more revisions exported
'*'
Traceback (most recent call last):
File "dumpgenerator.py", line 2528, in <module>
main()
File "dumpgenerator.py", line 2518, in main
resumePreviousDump(config=config, other=other)
File "dumpgenerator.py", line 2165, in resumePreviousDump
session=other['session'])
File "dumpgenerator.py", line 727, in generateXMLDump
for xml in getXMLRevisions(config=config, session=session, start=start):
File "dumpgenerator.py", line 829, in getXMLRevisions
yield makeXmlFromPage(page)
File "dumpgenerator.py", line 1083, in makeXmlFromPage
raise PageMissingError(page['title'], e)
__main__.PageMissingError: page 'Entryway' not found
"Entryway" is doomwiki's front page.
I've dumped this (and other small) mediawikis a few times over the years w/o error, but with a much older version of the tooling (23efbef
)
I can reproduce with
python2 dumpgenerator.py --api=https://doomwiki.org/w/api.php --xml --xmlrevisions
but works for me without --xmlrevisions (which is not enabled by default for a reason).
Or at least it worked for me until the point:
Hissy, 57 edits
Downloaded 4280 pages Hit point, 24 edits Hit points, 1 edit Hitscan, 22 edits
(will post the dump later)
$ python2 dumpgenerator.py --api=https://doomwiki.org/w/api.php --xml --xmlrevisions Checking API... https://doomwiki.org/w/api.php API is OK: https://doomwiki.org/w/api.php Checking index.php... https://doomwiki.org/w/index.php index.php is OK #########################################################################
Welcome to DumpGenerator 0.4.0-alpha by WikiTeam (GPL v3)
More info at: https://github.com/WikiTeam/wikiteam
#########################################################################
#########################################################################
Copyright (C) 2011-2020 WikiTeam developers
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
#########################################################################
Analysing https://doomwiki.org/w/api.php
Trying generating a new dump into a new directory...
Loading page titles from namespaces = all
Excluding titles from namespaces = None
20 namespaces found
Retrieving titles in the namespace 0
14377 titles retrieved in the namespace 0
Retrieving titles in the namespace 1
1447 titles retrieved in the namespace 1
Retrieving titles in the namespace 2
488 titles retrieved in the namespace 2
Retrieving titles in the namespace 3
507 titles retrieved in the namespace 3
Retrieving titles in the namespace 4
70 titles retrieved in the namespace 4
Retrieving titles in the namespace 5
44 titles retrieved in the namespace 5
Retrieving titles in the namespace 6
17746 titles retrieved in the namespace 6
Retrieving titles in the namespace 7
293 titles retrieved in the namespace 7
Retrieving titles in the namespace 8
262 titles retrieved in the namespace 8
Retrieving titles in the namespace 9
11 titles retrieved in the namespace 9
Retrieving titles in the namespace 10
1353 titles retrieved in the namespace 10
Retrieving titles in the namespace 11
159 titles retrieved in the namespace 11
Retrieving titles in the namespace 12
10 titles retrieved in the namespace 12
Retrieving titles in the namespace 13
3 titles retrieved in the namespace 13
Retrieving titles in the namespace 14
2083 titles retrieved in the namespace 14
Retrieving titles in the namespace 15
70 titles retrieved in the namespace 15
Retrieving titles in the namespace 2300
0 titles retrieved in the namespace 2300
Retrieving titles in the namespace 2301
0 titles retrieved in the namespace 2301
Retrieving titles in the namespace 2302
0 titles retrieved in the namespace 2302
Retrieving titles in the namespace 2303
0 titles retrieved in the namespace 2303
Titles saved at... doomwikiorg_w-20200703-titles.txt
38923 page titles loaded
https://doomwiki.org/w/api.php
Getting the XML header from the API
Retrieving the XML for every page from the beginning
20 namespaces found
Trying to export all revisions from namespace 0
Trying to get wikitext from the allrevisions API and to build the XML
...
2 more revisions exported
'*'
Traceback (most recent call last):
File "dumpgenerator.py", line 2528, in
I'm suddenly getting the same on both Linux and Windows (Conda 2.7), and BOTH on resumes and new loads. On a page where exports worked two days ago. And the page does exist.