pypac icon indicating copy to clipboard operation
pypac copied to clipboard

Use a cache for domain lookup

Open Res260 opened this issue 3 years ago • 1 comments

Right now, only the proxy parsing gets cached: image

There should be a mechanism to not have to evaluate JS (costly) multiple time for the same domain name. I've been profiling my app which does a lot of requests, and this is a bottleneck.

Res260 avatar Jan 15 '21 21:01 Res260

This issue is tricky because of timeRange(). Though that function may be rarely used, its existence means caching to reduce JS calls could introduce incorrect behaviour. pac_context_for_url() is different approach to avoiding JS evaluation.

I did a quick performance check with the following notebook snippet:

%load_ext autoreload
%autoreload 2
from pypac.parser import PACFile
from pypac.resolver import ProxyResolver


pac = PACFile(
    """
function FindProxyForURL(url, host) {

  if (isPlainHostName(host) || dnsDomainIs(host, ".mydomain.com"))
    return "DIRECT";

  else if (shExpMatch(host, "*.com"))
    return "PROXY proxy1.mydomain.com:8080; " +
           "PROXY proxy4.mydomain.com:8080";

  else if (shExpMatch(host, "*.edu"))
    return "PROXY proxy2.mydomain.com:8080; " +
           "PROXY proxy4.mydomain.com:8080";

  else
    return "PROXY proxy3.mydomain.com:8080; " +
           "PROXY proxy4.mydomain.com:8080";
}
"""
)

resolver = ProxyResolver(pac)
#%%
%timeit resolver.get_proxy("http://example.com")

Results:

  • 71 µs ± 209 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) on local machine
  • 10000 loops, best of 5: 149 µs per loop on Colaboratory

carsonyl avatar Feb 28 '21 06:02 carsonyl