zim-tools icon indicating copy to clipboard operation
zim-tools copied to clipboard

zimdump stops if cannot create redirect because of invalid filename

Open asashnov opened this issue 4 years ago • 9 comments

zim-tools(5ed81f87) zimdump stops if cannot create redirect symlink because of an invalid filename. When dump files it catches the exception and continues.

$ ~/oz/zim-tools/build/src/zimdump dump --redirect --dir=wikipedia_en_100_maxi    ZIM.files/wikipedia_en_100_maxi_2020-05.zim
Exception: Error creating symlink from United_States to wikipedia_en_100_maxi/A/United_States_of_America/Introduction

$ du -hs wikipedia_en_100_maxi
1,7M	wikipedia_en_100_maxi

$ find wikipedia_en_100_maxi -type f | wc -l
16

$ find wikipedia_en_100_maxi -type f 
wikipedia_en_100_maxi/A/List_of_prime_ministers_of_India
wikipedia_en_100_maxi/A/List_of_United_States_cities_by_population
wikipedia_en_100_maxi/A/Protein
wikipedia_en_100_maxi/A/Hippopotamus
wikipedia_en_100_maxi/A/Spider
wikipedia_en_100_maxi/A/Association_football
wikipedia_en_100_maxi/-/style.css
wikipedia_en_100_maxi/-/j/js_modules/script.js
wikipedia_en_100_maxi/-/j/js_modules/images_loaded.min.js
wikipedia_en_100_maxi/-/j/js_modules/article_list_home.js
wikipedia_en_100_maxi/-/j/js_modules/node_module/details-element-polyfill/dist/details-element-polyfill.js
wikipedia_en_100_maxi/-/j/js_modules/masonry.min.js
wikipedia_en_100_maxi/-/s/css_modules/inserted_style.css
wikipedia_en_100_maxi/-/s/css_modules/style.css
wikipedia_en_100_maxi/-/s/css_modules/content.parsoid.css
wikipedia_en_100_maxi/-/s/css_modules/mobile_main_page.css

When without --redirect option zimdump handles the exceptions:

$ ~/oz/zim-tools/build/src/zimdump dump --dir=wikipedia_en_100_maxi    ZIM.files/wikipedia_en_100_maxi_2020-05.zim
Wrote wikipedia_en_100_maxi/A/United_States_of_America/Introduction to wikipedia_en_100_maxi/_exceptions/A%2fUnited_States_of_America%2fIntroduction
Wrote wikipedia_en_100_maxi/A/United_States_of_America/OldPage to wikipedia_en_100_maxi/_exceptions/A%2fUnited_States_of_America%2fOldPage
Wrote wikipedia_en_100_maxi/A/United_States to wikipedia_en_100_maxi/_exceptions/A%2fUnited_States


$ du -hs wikipedia_en_100_maxi
66M	wikipedia_en_100_maxi

$ find wikipedia_en_100_maxi -type f | wc -l
3710

asashnov avatar Nov 20 '20 15:11 asashnov

I met the same problem.

FledgeXu avatar Jan 25 '21 05:01 FledgeXu

Similar to #213

kelson42 avatar Feb 07 '21 09:02 kelson42

I looked into this problem but I'm not sure what a good solution would be. Because a file and directory can't share the same name on linux, some redirects in wikipedia tend to break with the symlink approach. Example:

% => Percent_sign
%/% => Environment_variables

If % is created first, then a file exists called %. When we go to create the file %/%, we first will need to create a directory called %, and it fails.

adamlamar avatar Dec 21 '22 01:12 adamlamar

@adamlamar There is a file with all exceptions. Not sire about the details, but it should be listed in the exception file IMO.

kelson42 avatar Dec 21 '22 05:12 kelson42

I'm able to see and understand the printed exception. The problem is the --redirect option can't save everything due to shared file and directory names. Either the symlink creation fails, or the content creation fails.

The easy fix would be to simply print out the error and move on without throwing an exception. However this leaves the content and/or redirects in a partial extracted state on the filesystem.

adamlamar avatar Dec 21 '22 23:12 adamlamar

@kelson42 Can developer take a look my patch at 190 so that this exception can be solved? Thanks

nickhuang99 avatar Jun 17 '24 23:06 nickhuang99

@nickhuang99 yes, but please create a PR.

kelson42 avatar Jun 18 '24 03:06 kelson42

@kelson42 sure. Created.

nickhuang99 avatar Jun 18 '24 14:06 nickhuang99

@kelson42 hi, I have another PR to solve one case of this issue when the failure of creating symlink due to it has been created before. The reason of symlink being created before can be due to repeatedly run dump or merge two zim together etc.

nickhuang99 avatar Jun 19 '24 04:06 nickhuang99