zim-tools icon indicating copy to clipboard operation
zim-tools copied to clipboard

zimdump - Bypassing excessively long filenames.

Open ISJ-439 opened this issue 3 years ago • 7 comments

Hello,

The filenames in a zim im trying to recover are too long, is it possible to add a flag or similar to bypass this or truncate it?

The error is "Exception: Error writing file to errors dir. " with an excessively long file name following.

Thanks

ISJ-439 avatar Aug 23 '22 06:08 ISJ-439

It should already be the case, we truncate filename longer than 255 chars. What is the file name you want to extract ? Which command are you using ?

mgautierfr avatar Aug 23 '22 09:08 mgautierfr

Im going to replace the site details with an equal number of x's for privacy, hope that's okay.

ZIMtools Version: 3.1.1-2

Command /home/localadmin/Downloads/zim-tools_linux-x86_64-3.1.1-2/zimdump dump --dir='/mnt/2TB' ./xxxxxxxxxxxxx.com-May2022_2022-05.zim

Filename Wrote /mnt/2TB/H/xxxxxxxxxxxxx.com/category/xxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxx/page/43/ to /mnt/2TB/_exceptions/H%2fxxxxxxxxxxxxx.com%2fxxxxxxxx%2fxxxxxxxxxxxxxxxxx%2fxxxxxxxxxxxxxxx%2fpage%2f43%2f Wrote /mnt/2TB/H/www.bitchute.com/embed/xxxxxxxxxxx/view/ to /mnt/2TB/_exceptions/H%2fwww.bitchute.com%2fembed%xxxxxxxxxxx%2fview%2f Error writing file to errors dir. /mnt/2TB/_exceptions/A%2fhtml5-player.libsyn.com%2fembed%2fepisode%2xxx%2xxxxxxxx%2fheight%2f90%2fwidth%2f640%2ftheme%2fcustom%2fautonext%2fno%2fthumbnail%2fyes%2fautoplay%2fno%2fpreload%2fno%2fno_addthis%2fno%2fdirection%2fbackward%2frender-playlist%2fno%2fcustom-color%2f0009aa%2f Exception: Error writing file to errors dir. /mnt/2TB/_exceptions/A%2fhtml5-player.libsyn.com%2fembed%2fepisode%2xxx%2xxxxxxxx%2fheight%2f90%2fwidth%2f640%2ftheme%2fcustom%2fautonext%2fno%2fthumbnail%2fyes%2fautoplay%2fno%2fpreload%2fno%2fno_addthis%2fno%2fdirection%2fbackward%2frender-playlist%2fno%2fcustom-color%2f0009aa%2f

{end of program output}

ISJ-439 avatar Aug 23 '22 14:08 ISJ-439

This is 262 characters which is over the 255 limit for file names.

ISJ-439 avatar Aug 23 '22 14:08 ISJ-439

What is failing is the writing of errored file.

zim dump try to write the file in your out directory (/mnt/2TB) but if it fails for some reason, it will write the file in the exception directory (/mnt/2TB/_exceptions), when doing so, it replace all / by %2f (so there is no subdirectory) and it doesn't try to truncate the filename.

The question is why it fails to write the file in the first instance ? (Sadly, zimdump doesn't report the error information)

  • What is the content of mnt/2TB/A/html5-player.libsyn.com/embed/episode/xx/xxxxxxx/height/90/width/640/theme/custom/autonext/no/thumbnail/yes/autoplay/no/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/0009aa/ ?
  • Is your directory full ? Or with some quota ?
  • ...

mgautierfr avatar Aug 30 '22 16:08 mgautierfr

Hello sir,

Firstly, thank you for your time.

"What is the content of mnt/2T" The content is likely related to a MP3 player applet, yes I know this is not supported and im sorry to use this outside the scope.

"Is your directory full ? Or with some quota ?" First thing i checked, no sir, almost a TB free, is set the directory to 'chmod -R ./* 777' so its not permissions.

My purposed solution: Provide a switch to bypass errors and just continue extracting. This would likely be simple and easy, in my limited knowledge.

Thank you.

ISJ-439 avatar Sep 01 '22 20:09 ISJ-439

I agree with you proposal. But it would even be better to also know why it fails. If you can provide me the zim file (even privatly if you don't what to publish it), I could investigate what is the root cause and fix it.

mgautierfr avatar Sep 02 '22 08:09 mgautierfr

No way to build anyt

My purposed solution: Provide a switch to bypass errors and just continue extracting. This would likely be simple and easy, in my limited knowledge.

We should first 100% understand the root cause.

kelson42 avatar Sep 02 '22 09:09 kelson42

Also having this issue, can provide a zim file for tests. Although it is around 3 GB in size, not sure where to upload it

LTVA1 avatar Dec 02 '22 17:12 LTVA1

You can use any file share service, for exemple wetransfer.com or file.io They are limited to 2GB files but you can cut the zim file.

mgautierfr avatar Dec 07 '22 16:12 mgautierfr

I will try to limit crawl scope so the file would be less than 2 gb in size asap

LTVA1 avatar Dec 07 '22 16:12 LTVA1

I made some simple modifications to ignore errors: https://github.com/openzim/zim-tools/pull/375

It now creates a more complete extraction. This could be incorporated into improved feedback to the user, options to ignore the errors, or hash the invalid names...

https://github.com/openzim/zim-tools/issues/373 In my case, invalid characters and long names cause the dump to error out and stop, with my modifications it keeps going and ignore the invalid and long names.

For example:

❯ ./zim-tools/build/src/zimdump dump --dir=./dump3 archive.zim
Error writing file to errors dir. ./dump3/_exceptions/H%2fplay.google.com%2flog?format=json&hasfast=true&authuser=0&__wb_method=POST&[[1,null,null,null,null,null,null,null,null,null,[null,null,null,null,"en",null,"17",null,null,[1,0,0,0,0]]],1654,[["1696854400954",null,[],null,null,null,null,"[[[\"%2fclient_streamz%2fpo%2fw%2fel\",null,[\"en\",\"rk\"],[[[[\"c\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1]],[[[\"c\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0]],[[[\"q\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0.8000030517578125]],[[[\"S\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3.5999984741210938]],[[[\"b\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,199.0999984741211]],[[[\"i\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1.5]],[[[\"r\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3440.6000061035156]],[[[\"C\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,2.4000015258789062]],[[[\"x\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0]],[[[\"m\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3.100006103515625]]],null,[]],[\"%2fclient_streamz%2fpo%2fw%2frl\",null,[\"mn\",\"ac\",\"sc\",\"rk\"],[[[[\"c\"],[null,1],[null,0],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1887.900001525879]],[[[\"g\"],[null,1],[null,0],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1331.400001525879]]],null,[]],[\"%2fclient_streamz%2fpo%2fw%2fcsc\",null,[\"cs\",\"rk\"],[[[[null,3],[\"O43z0dpjhgX20SCx4KAo\"]],[1]]],null,[]]]]",null,null,null,null,null,null,0,[null,[],null,"[[],[],[],[]]"],null,null,null,[],1,null,null,null,null,null,[]]],"1696854400955",[]]
Error writing file to errors dir. ./dump3/_exceptions/A%2fplay.google.com%2flog?format=json&hasfast=true&authuser=0&__wb_method=POST&[[1,null,null,null,null,null,null,null,null,null,[null,null,null,null,"en",null,"17",null,null,[1,0,0,0,0]]],1654,[["1696854400954",null,[],null,null,null,null,"[[[\"%2fclient_streamz%2fpo%2fw%2fel\",null,[\"en\",\"rk\"],[[[[\"c\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1]],[[[\"c\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0]],[[[\"q\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0.8000030517578125]],[[[\"S\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3.5999984741210938]],[[[\"b\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,199.0999984741211]],[[[\"i\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1.5]],[[[\"r\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3440.6000061035156]],[[[\"C\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,2.4000015258789062]],[[[\"x\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0]],[[[\"m\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3.100006103515625]]],null,[]],[\"%2fclient_streamz%2fpo%2fw%2frl\",null,[\"mn\",\"ac\",\"sc\",\"rk\"],[[[[\"c\"],[null,1],[null,0],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1887.900001525879]],[[[\"g\"],[null,1],[null,0],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1331.400001525879]]],null,[]],[\"%2fclient_streamz%2fpo%2fw%2fcsc\",null,[\"cs\",\"rk\"],[[[[null,3],[\"O43z0dpjhgX20SCx4KAo\"]],[1]]],null,[]]]]",null,null,null,null,null,null,0,[null,[],null,"[[],[],[],[]]"],null,null,null,[],1,null,null,null,null,null,[]]],"1696854400955",[]]

2600box avatar Oct 11 '23 08:10 2600box

Sample file for tests: https://www.swisstransfer.com/d/306ae305-8b17-455f-862f-13c15ca93121

benoit74 avatar Nov 01 '23 09:11 benoit74

This is basically a duplicate of #213

kelson42 avatar Mar 31 '24 20:03 kelson42