scrapy-splash icon indicating copy to clipboard operation
scrapy-splash copied to clipboard

Splash memory leak

Open Ethan353 opened this issue 5 months ago • 0 comments

I have used scrapy splash for requesting in my crawling service. after amount of time my services usage of ram increase continuesly and after a while they use all ram of a vm. the wierd thing is splash service it self works properly but services which use splash for requests have memory leak. for more detail here is my code snippet and splash config i uses: code:

if condition_to_use_splash:
    return SplashRequest(url, errback=self.errback, callback=self.parse, meta=metadata, args={'wait': 7})
else:
    return FormRequest(url, dont_filter=True, errback=self.errback,method=method, formdata=parameter, meta=metadata)

config:

SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}

DOWNLOADER_MIDDLEWARES = {
    'solaris_scrapy.solaris_scrapy.middlewares.ProxyMiddleware': 100,
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
    'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
    'scrapy_fake_useragent.middleware.RetryUserAgentMiddleware': 401,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'

I use splash 3.1 as splash image and it is my splash service docker compose:

services:
  splash:
    image: scrapinghub/splash:3.1
    ports:
      - "prot:port"
    networks:
      - net

note that I run my code on a vm in a docker container. what do you think I should do about. I also aware of memory limit, maxrss and slots for preventing splash use lots of ram but this way causes my crawling service misses bunch of websites. how should I handle It in my code?

Ethan353 avatar Jan 13 '24 12:01 Ethan353