pypac
pypac copied to clipboard
Use a cache for domain lookup
Right now, only the proxy parsing gets cached:
There should be a mechanism to not have to evaluate JS (costly) multiple time for the same domain name. I've been profiling my app which does a lot of requests, and this is a bottleneck.
This issue is tricky because of timeRange()
. Though that function may be rarely used, its existence means caching to reduce JS calls could introduce incorrect behaviour. pac_context_for_url()
is different approach to avoiding JS evaluation.
I did a quick performance check with the following notebook snippet:
%load_ext autoreload
%autoreload 2
from pypac.parser import PACFile
from pypac.resolver import ProxyResolver
pac = PACFile(
"""
function FindProxyForURL(url, host) {
if (isPlainHostName(host) || dnsDomainIs(host, ".mydomain.com"))
return "DIRECT";
else if (shExpMatch(host, "*.com"))
return "PROXY proxy1.mydomain.com:8080; " +
"PROXY proxy4.mydomain.com:8080";
else if (shExpMatch(host, "*.edu"))
return "PROXY proxy2.mydomain.com:8080; " +
"PROXY proxy4.mydomain.com:8080";
else
return "PROXY proxy3.mydomain.com:8080; " +
"PROXY proxy4.mydomain.com:8080";
}
"""
)
resolver = ProxyResolver(pac)
#%%
%timeit resolver.get_proxy("http://example.com")
Results:
- 71 µs ± 209 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) on local machine
- 10000 loops, best of 5: 149 µs per loop on Colaboratory