malspider icon indicating copy to clipboard operation
malspider copied to clipboard

Jobs maxing at 10,000

Open wrinkl3 opened this issue 8 years ago • 1 comments

Malspider appears to stop launching new jobs once the number of the completed ones reaches 10,000 (as seen here). It still consumes a large amount of RAM. Restarting the server seems to temporary solve it, as the "completed jobs" counter resets to zero.

wrinkl3 avatar Nov 30 '16 08:11 wrinkl3

Hey Alex,

New jobs are still launched, but scrapyd (according to the default settings) will only keep 10k finished jobs in the launcher. This means once you reach 10k, the oldest finished jobs will be removed from the launcher to keep the # at <= 10k. You can change this setting in the malspider/scrapyd.conf file. Look for "finished_to_keep" and change it to whatever you like. scrapyd ships with the default value of 100, but I changed it to 10k for Malspider. You'll also see a "jobs_to_keep" setting. This relates to how many log files to store.

Ideally we should have a counter in the database to track the # of jobs completed, or get rid of the "finished jobs" status altogether. I'm leaning towards getting rid of it since there are plenty of spidering stats and info about outstanding and running jobs. Do you agree with that?

-James

On Wed, Nov 30, 2016 at 3:34 AM, Alex Shatberashvili < [email protected]> wrote:

Malspider appears to stop launching new jobs once the number of the completed ones reaches 10,000 (as seen here http://i.imgur.com/itENqeu.png). It still consumes a large amount of RAM. Restarting the server seems to temporary solve it, as the "completed jobs" counter resets to zero.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ciscocsirt/malspider/issues/14, or mute the thread https://github.com/notifications/unsubscribe-auth/AR0QEOV8t6aiGhxq7awm7em257U5oCxPks5rDTUngaJpZM4K_7kH .

jasheppa5 avatar Nov 30 '16 13:11 jasheppa5