python-wayback-machine-downloader icon indicating copy to clipboard operation
python-wayback-machine-downloader copied to clipboard

gzip problem?

Open joaopedro32 opened this issue 1 month ago • 9 comments

i'm trying to use this and it goes well but suddenly this thing appears and i just cant do anything, the command is this waybackup -u roguestatus.com/ -a --filetype html,txt -o waybackup --retry 5

Image

joaopedro32 avatar Oct 20 '25 16:10 joaopedro32

hey :) do you have an error log with the full traceback?

bitdruid avatar Oct 20 '25 16:10 bitdruid

yea let me send it, your quick lol

waybackup_error.log

joaopedro32 avatar Oct 20 '25 16:10 joaopedro32

and also it just downloads robots.txt and not actually all the snapshots of the website and its other things i think its called paths or just every single url thats this url/*

Image Image

joaopedro32 avatar Oct 20 '25 16:10 joaopedro32

give me some time. im not at home right now :)

bitdruid avatar Oct 20 '25 16:10 bitdruid

oh its fine!

joaopedro32 avatar Oct 20 '25 16:10 joaopedro32

give me some time. im not at home right now :)

so i tried your last command. just a small hint:

you don't need to give a wildcard (*) to waybackup. just write:

waybackup -u roguestatus.com -l --filetype html,txt -o waybackup --retry 5

it will download everything from your specified subdir anyway.

however i could replicate the gzip exception. it seems the file does not start with the gzip-header and is in fact not compressed. these files are now downloaded raw.

bitdruid avatar Oct 20 '25 20:10 bitdruid

And one last thing, It worked but how do i only download the text/content of the website only, i want to do it because it downloads too much useless stuff and downloading html only works with forums IIRC, Thanks.

joaopedro32 avatar Oct 23 '25 20:10 joaopedro32

and do you have discord so i can talk to you more?

joaopedro32 avatar Oct 23 '25 20:10 joaopedro32

And one last thing, It worked but how do i only download the text/content of the website only, i want to do it because it downloads too much useless stuff and downloading html only works with forums IIRC, Thanks.

what do you mean exactly ? your command downloads html and txt just fine :) send me a mail (check my profile)

bitdruid avatar Oct 23 '25 22:10 bitdruid