soupscraper icon indicating copy to clipboard operation
soupscraper copied to clipboard

How to process what's already downloaded

Open schlingel opened this issue 4 years ago • 8 comments

I have let it run for multiple days now. It has downloaded all pages and a bunch of assets (around 2500 from 4528).

But I guess the soup service is shutdown. I only get 503s and not even the logo is on the assets URL anymore.

Is there a way of "materializing" what's already in the cache without the need to finish the download of all assets?

schlingel avatar Jul 20 '20 16:07 schlingel

1up from me. It appears the site is closed for good. Is there a way to finalize the process without downloading all assets?

Cheers!

Pumpkineer avatar Jul 21 '20 07:07 Pumpkineer

@Pumpkineer Hey, I just tried the new IPs for the soup servers and it seems to get me some additional assets. Try it too, maybe you can get a few more assets (if not all) out of it! @nathell did update the readme with the IPs and how to update the hosts file.

schlingel avatar Jul 21 '20 10:07 schlingel

Yeah, the /etc/hosts workaround should work for now. I'll leave this issue open, though, because I do want to make it possible to finalize the process. Might take a few days though.

nathell avatar Jul 21 '20 10:07 nathell

Mine just gave up on the last file, saying: Received fatal alert: handshake_failure. I guess I'm that lucky that I made my copy in time? :P

obraz

dragon99919 avatar Jul 22 '20 08:07 dragon99919

@nathell That would be great. I'm still missing 300 assets and since yesterday only one new one could be downloaded.

schlingel avatar Jul 22 '20 09:07 schlingel

@dragon99919 @schlingel If you're still facing this, try to look at the end of log/skyscraper.log and see which URLs it's trying (and failing) to download. I have received report that you might have to add extra domains to the hosts file; specifically

45.153.143.248 0.asset.soup.io

but maybe also others (depending which URLs it's having trouble with).

nathell avatar Jul 22 '20 09:07 nathell

Worked like a charm, thanks! Maybe adding this to readme would help prevent future issues with it?

dragon99919 avatar Jul 22 '20 11:07 dragon99919

@nathell Any news on finalizing the process?

MartinKei avatar Aug 19 '20 12:08 MartinKei