supercrawler icon indicating copy to clipboard operation
supercrawler copied to clipboard

Crawling binary files

Open joshuambg opened this issue 7 years ago • 3 comments

supercrawler is picking up ALL links on a page. If there are links to movie files, images, or any large files it will add these URLs to the queue. The urls get passed to request which tries to download them.

joshuambg avatar Dec 04 '18 08:12 joshuambg

I want the keep the ability to download binary files, but I know it could be problematic downloading large binary data. What behaviour do you expect here? Maybe a max file size, or an event handler that inspects the headers and can cancel a request?

On Tue, 4 Dec 2018, 08:29 joshua-mbg <[email protected] wrote:

supercrawler is picking up ALL links on a page. If there are links to movie files, images, or any large files it will add these URLs to the queue. The urls get passed to request which tries to download them.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/brendonboshell/supercrawler/issues/21, or mute the thread https://github.com/notifications/unsubscribe-auth/AA6EofZYkvG3HUocsSXvg1u7t4X5hxxTks5u1jJpgaJpZM4ZAJbH .

brendonboshell avatar Dec 04 '18 09:12 brendonboshell

I have run into the same problem. I'm working on a fix for this issue.

cbess avatar Feb 19 '19 13:02 cbess

I finally addressed this issue. I believe it is resolved with https://github.com/brendonboshell/supercrawler/pull/45

cbess avatar Dec 08 '19 02:12 cbess