Investigate hiccups during startup
It's especially noticeable when using shorter living jobs
soft_timeout = 30s
hard_timeout = 60s
throughput falls almost 3 times...
Could be redirect related... Often we would get a redirect to a subdomain or even totally different domain - which we treat as job bounds violation,
=> meaning we've spent some resources and now have to discard this job while inserting a newly discovered domain into the global resolver Q which is usually incredibly congested, so chances are it never makes it(will be discarded as overflow)
With high percentage of jobs throwing redirects leading to job bounds violation chances are we can be stuck in this vicious cycle for a while...
Plan is.
- Dedicated DNS resolver pools - capacity splited between seeding and domain discovery
- Newly started job is a seed and it's first task will be using seeding DNS resolver via .await style calls(i.e. we would delegate resolving to a pool and await result asynchronously)
- Seed task is allowed to violate typical same-domain bounds as long as resolved DNS
addr_keystays the same(meaning while we violate domain bounds we stay within the sameaddr_key) - As soon as we finish with seed task every other task would use the domain name seed task ended up using
- While seed task is progressing through redirects we should not emit those as "newly discovered domains"
- On Job finish we should also record all "aliases" of a checked domain so that if we find any of those we'd know we already checked it
This should eliminate resource waste in Jobs that start with redirects that change source domain name