zim-tools
zim-tools copied to clipboard
zimdump stops if cannot create redirect because of invalid filename
zim-tools(5ed81f87) zimdump stops if cannot create redirect symlink because of an invalid filename. When dump files it catches the exception and continues.
$ ~/oz/zim-tools/build/src/zimdump dump --redirect --dir=wikipedia_en_100_maxi ZIM.files/wikipedia_en_100_maxi_2020-05.zim
Exception: Error creating symlink from United_States to wikipedia_en_100_maxi/A/United_States_of_America/Introduction
$ du -hs wikipedia_en_100_maxi
1,7M wikipedia_en_100_maxi
$ find wikipedia_en_100_maxi -type f | wc -l
16
$ find wikipedia_en_100_maxi -type f
wikipedia_en_100_maxi/A/List_of_prime_ministers_of_India
wikipedia_en_100_maxi/A/List_of_United_States_cities_by_population
wikipedia_en_100_maxi/A/Protein
wikipedia_en_100_maxi/A/Hippopotamus
wikipedia_en_100_maxi/A/Spider
wikipedia_en_100_maxi/A/Association_football
wikipedia_en_100_maxi/-/style.css
wikipedia_en_100_maxi/-/j/js_modules/script.js
wikipedia_en_100_maxi/-/j/js_modules/images_loaded.min.js
wikipedia_en_100_maxi/-/j/js_modules/article_list_home.js
wikipedia_en_100_maxi/-/j/js_modules/node_module/details-element-polyfill/dist/details-element-polyfill.js
wikipedia_en_100_maxi/-/j/js_modules/masonry.min.js
wikipedia_en_100_maxi/-/s/css_modules/inserted_style.css
wikipedia_en_100_maxi/-/s/css_modules/style.css
wikipedia_en_100_maxi/-/s/css_modules/content.parsoid.css
wikipedia_en_100_maxi/-/s/css_modules/mobile_main_page.css
When without --redirect
option zimdump handles the exceptions:
$ ~/oz/zim-tools/build/src/zimdump dump --dir=wikipedia_en_100_maxi ZIM.files/wikipedia_en_100_maxi_2020-05.zim
Wrote wikipedia_en_100_maxi/A/United_States_of_America/Introduction to wikipedia_en_100_maxi/_exceptions/A%2fUnited_States_of_America%2fIntroduction
Wrote wikipedia_en_100_maxi/A/United_States_of_America/OldPage to wikipedia_en_100_maxi/_exceptions/A%2fUnited_States_of_America%2fOldPage
Wrote wikipedia_en_100_maxi/A/United_States to wikipedia_en_100_maxi/_exceptions/A%2fUnited_States
$ du -hs wikipedia_en_100_maxi
66M wikipedia_en_100_maxi
$ find wikipedia_en_100_maxi -type f | wc -l
3710
I met the same problem.
Similar to #213
I looked into this problem but I'm not sure what a good solution would be. Because a file and directory can't share the same name on linux, some redirects in wikipedia tend to break with the symlink approach. Example:
% => Percent_sign
%/% => Environment_variables
If %
is created first, then a file exists called %
. When we go to create the file %/%
, we first will need to create a directory called %
, and it fails.
@adamlamar There is a file with all exceptions. Not sire about the details, but it should be listed in the exception file IMO.
I'm able to see and understand the printed exception. The problem is the --redirect
option can't save everything due to shared file and directory names. Either the symlink creation fails, or the content creation fails.
The easy fix would be to simply print out the error and move on without throwing an exception. However this leaves the content and/or redirects in a partial extracted state on the filesystem.
@kelson42 Can developer take a look my patch at 190 so that this exception can be solved? Thanks
@nickhuang99 yes, but please create a PR.
@kelson42 sure. Created.
@kelson42 hi, I have another PR to solve one case of this issue when the failure of creating symlink due to it has been created before. The reason of symlink being created before can be due to repeatedly run dump or merge two zim together etc.