wikiteam
wikiteam copied to clipboard
Dumpgenerator halts without finishing
It seems to do it at a random point. It should time out and retry automatically.
Downloaded 10 images
Downloaded 20 images
Downloaded 30 images
Downloaded 40 images
Downloaded 50 images
Downloaded 60 images
Downloaded 70 images
Downloaded 80 images
Downloaded 90 images
Downloaded 100 images
Downloaded 110 images
^CTraceback (most recent call last):
File "dumpgenerator.py", line 2084, in <module>
main()
File "dumpgenerator.py", line 2076, in main
createNewDump(config=config, other=other)
File "dumpgenerator.py", line 1663, in createNewDump
session=other['session'])
File "dumpgenerator.py", line 1109, in generateImageDump
r = requests.get(url=url)
File "/usr/lib/python2.7/dist-packages/requests/api.py", line 67, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python2.7/dist-packages/requests/api.py", line 53, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 468, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 608, in send
r.content
File "/usr/lib/python2.7/dist-packages/requests/models.py", line 737, in content
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
File "/usr/lib/python2.7/dist-packages/requests/models.py", line 660, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "/usr/lib/python2.7/dist-packages/urllib3/response.py", line 344, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "/usr/lib/python2.7/dist-packages/urllib3/response.py", line 301, in read
data = self._fp.read(amt)
File "/usr/lib/python2.7/httplib.py", line 612, in read
s = self.fp.read(amt)
File "/usr/lib/python2.7/socket.py", line 384, in read
data = self._sock.recv(left)
KeyboardInterrupt
@TimSC Can't you resume? Also, you can delete the downloaded images entries from -images.txt file.
Resume for images doesn't work for me. Possibly related to #250 ?
If I repeatedly resume with a modified -images.txt, this works as a work around.
We cannot do much for transient errors apart from replacing that requests.get(url=url) with our now usual session.get(url=url) which uses some rather insistent retrying. If a particular wiki fails constantly upon multiple attempts, we can look into how to solve that; otherwise, just retry.
Note that part of the retrying is now done (or not) by mwclient when using the API.