supercrawler Crawling binary files

supercrawler is picking up ALL links on a page. If there are links to movie files, images, or any large files it will add these URLs to the queue. The urls get passed to request which tries to download them.

Dec 04 '18 08:12 joshuambg

I want the keep the ability to download binary files, but I know it could be problematic downloading large binary data. What behaviour do you expect here? Maybe a max file size, or an event handler that inspects the headers and can cancel a request?

On Tue, 4 Dec 2018, 08:29 joshua-mbg <[email protected] wrote:

supercrawler is picking up ALL links on a page. If there are links to movie files, images, or any large files it will add these URLs to the queue. The urls get passed to request which tries to download them.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/brendonboshell/supercrawler/issues/21, or mute the thread https://github.com/notifications/unsubscribe-auth/AA6EofZYkvG3HUocsSXvg1u7t4X5hxxTks5u1jJpgaJpZM4ZAJbH .

Dec 04 '18 09:12 brendonboshell

I have run into the same problem. I'm working on a fix for this issue.

Feb 19 '19 13:02 cbess

I finally addressed this issue. I believe it is resolved with https://github.com/brendonboshell/supercrawler/pull/45

Dec 08 '19 02:12 cbess