Crawling binary files
supercrawler is picking up ALL links on a page. If there are links to movie files, images, or any large files it will add these URLs to the queue. The urls get passed to request which tries to download them.
I want the keep the ability to download binary files, but I know it could be problematic downloading large binary data. What behaviour do you expect here? Maybe a max file size, or an event handler that inspects the headers and can cancel a request?
On Tue, 4 Dec 2018, 08:29 joshua-mbg <[email protected] wrote:
supercrawler is picking up ALL links on a page. If there are links to movie files, images, or any large files it will add these URLs to the queue. The urls get passed to request which tries to download them.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/brendonboshell/supercrawler/issues/21, or mute the thread https://github.com/notifications/unsubscribe-auth/AA6EofZYkvG3HUocsSXvg1u7t4X5hxxTks5u1jJpgaJpZM4ZAJbH .
I have run into the same problem. I'm working on a fix for this issue.
I finally addressed this issue. I believe it is resolved with https://github.com/brendonboshell/supercrawler/pull/45