pywb icon indicating copy to clipboard operation
pywb copied to clipboard

Warcserver - Performance issues / warmup effects / caching

Open msvensson222 opened this issue 5 months ago • 0 comments

Hey!

Really appreciate this project & repo, been super fun to work with!

Now, I've got a question I can't really seem to figure out the answer to, so I come here in hopes for some kind soul to help me out :)

Background and problem

I have ~750k WARC records locally (25 files, ~1GB each), with corresponding .cdxj files (one per, so also 25 files).

I start the warcserver like; warcserver -t 10 locally on my macbook pro. Now, if I sequentially perform a lot of requests like;

endpoint = "http://localhost:8070/"
full_request_url= f"{endpoint}my-coll/resource?url={url}"

with random urls, it takes around 700 requests before the average response time stabilizes. (See attached image below). Not sure if relevant, but I can see a lot of Dir collections/my-col/indexes/ unchanged among the requests in the warcserver logs as well.

My questions

  1. Why is this? Is there some type of caching going on? I've searched the entire pywb docs but can't seem to find anything relating to caching.
  2. Can I somehow "avoid" this warmup period?

Thanks in advance!

Image

msvensson222 avatar May 18 '25 11:05 msvensson222