wiki-java-tools icon indicating copy to clipboard operation
wiki-java-tools copied to clipboard

Imker aborts whole category download whenever a single download fails

Open nicolas-raoul opened this issue 6 years ago • 3 comments

Version: v16.09.13 Stack trace:

java.lang.UnknownError: MW API error. Server response was: <?xml version="1.0"?><api servedby="mw2283"><error code="maxlag" info="Waiting for 10.192.32.167: 3.3404757976532 seconds lagged." host="10.192.32.167" lag="3.3404757976532" type="db" xml:space="preserve">See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &amp;lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&amp;gt; for notice of API deprecations and breaking changes.</error></api>

	at wiki.Wiki.fetch(Unknown Source)
	at wiki.Wiki.getImage(Unknown Source)
	at wiki.Wiki.getImage(Unknown Source)
	at app.ImkerBase$1.fetch(Unknown Source)
	at app.App.attemptFetch(Unknown Source)
	at app.ImkerBase.downloadLoop(Unknown Source)
	at app.ImkerGUI$4.doInBackground(Unknown Source)
	at app.ImkerGUI$4.doInBackground(Unknown Source)
	at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:295)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:334)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:844)

That happened after 1117 files (out of many more) got downloaded. 3 seconds of lagging does not sound like a very serious problem that would requires to abort the whole category download.

  • What happens currently: Imker aborts the whole download, so I have to run again the whole category upload
  • What I would expect: Imker wait a few seconds and retries the download, and abort that particular file if it fails again, then proceeding with the rest of the files.

nicolas-raoul avatar Sep 20 '18 03:09 nicolas-raoul

I have this problem often.

luc7v avatar Sep 20 '18 09:09 luc7v

Using the imker-gui.jar (equally 16.09.13) for the first time, I join this observation. To ease replication / bug-fixing, this was my procedure:

  • identification of category SVG Deutsche Einheitskurzschrift close to the bottom here, which is part of collection of 20.6k entries (root entry)
  • launching the java gui in Linux Xubuntu 18.04.4 LTS / and openjdk (version "11.0.6" 2020-01-14)
  • pasting the category into the GUI, the program successfully populates its internal register to consider with 20614 relevant entries
  • fetching the data stops at file #13252/20616 (i.e., about 64%) with 123 MByte collected. At this time, there is still more than 1 GB freely accessible platter space. The error report differs slightly in respect of the numbers, at least according to a diffview vs. the initial report by @nicolas-raoul. To quote the same part:

at wiki.Wiki.fetch(Unknown Source) at wiki.Wiki.getImage(Unknown Source) at wiki.Wiki.getImage(Unknown Source) at app.ImkerBase$1.fetch(Unknown Source) at app.App.attemptFetch(Unknown Source) at app.ImkerBase.downloadLoop(Unknown Source) at app.ImkerGUI$4.doInBackground(Unknown Source) at app.ImkerGUI$4.doInBackground(Unknown Source) at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)

Thus, a feature suggest: Let Imker write a permanent list of the files to download which a) the program may use if for whatever reason the batch was not yet completed. Which b) may be used by an explicit indication by the user, e.g. a quarter of a year later, to collect media in the same category which were added since the last survey, lowering the traffic neccessary.

added: With Wikimedia's own list generator such a listing may be created (even split into multiple files, too). Character encoding (e.g., Umlauts) occasionally may be an issue Imker in the files downloaded did not show, though.

nbehrnd avatar May 03 '20 09:05 nbehrnd

@nicolas-raoul Translating «Kategorie» to category, and «Anzahl der Listen» into number of lists to generate is one thing. While unlikely to be exhaustive, the little list mentioned taught me the following substitution rules between «safe for internet / pure ASCII (maybe even 7 bit?)» and special characters the uploaders may use in the file names.

|-------------------------------+-----------------------------------------|
| code -> substitute (keyed as) | example                                 |
|-------------------------------+-----------------------------------------|
| %C3%A4 -> ä ("a)              | Kläranlage ([water] purification plant) |
| %C3%B6 -> ö ("o)              | öffentlich (public, adjective)          |
| %C3%BC -> ü ("u)              | Bürger (citizen)                        |
| %C3%9F -> ß ("s, or Alt + s)  | Kuß (kiss, noun)                        |
| %C3%AE -> î (^i)              | maître (master, noun)                   |
| %C3%A9 -> é ('e)              | école (school)                          |
|-------------------------------+-----------------------------------------|
| %C3%84 -> Ä ("A)              | Ärmelkanal (the British channel)        |
| %C3%96 -> Ö ("O)              | Öffentlichkeit (public, noun)           |
| %C3%9C -> Ü ("U)              | Überraschung (surprise, noun)           |
|-------------------------------+-----------------------------------------|
| %2C -> ,                      | (comma)                                 |
| %21 -> !                      | (exclamation mark)                      |
| %27 -> '                      | (apostrophe)                            |
| %28 -> (                      | (opening parenthesis)                   |
| %29 -> )                      | (closing parenthesis)                   |
|-------------------------------+-----------------------------------------|

This gives a good reason to watch out for proper character encoding. And well, the third group (again, comme le 3e group) is the more tricky one I did not expect to see there as permitted.

nbehrnd avatar May 05 '20 14:05 nbehrnd