Add task polling data to task responses
What changed? Add some additional information to polling responses about the task queue
Outcome of discussion w/ @ShahabT and @dnr
Initially I would've liked TaskQueueStats to be returned on every poll response. This appears to be too much from both a performance and complexity perspective initially because regularly updating and distributing aggregated data across partitions would require more frequent internal RPCs, and would require them to go in a direction they don't currently (from sub-partition -> root rather than just the other way around).
The alternative is to have SDK acquire that data via DescribeTaskQueue. Calling that very frequently would also be problematic, since there is an RPS limit per-namespace.
The landed on middle ground, which is low complexity but ought to be reasonably efficient, is to include TaskQueueStats in the response whenever a poll hits the root partition, where it is readily available. If the SDK has not seen such a response in some reasonable TTL (~seconds), then it will call DescribeTaskQueue in order to have an updated view. In practice, it should be fairly unlikely for a worker with a reasonable number of pollers to somehow miss root, and low-traffic queues are also more likely to forward to the root to see if there is a task. Since the desired outcome SDK side is to only have a low number of pollers when there's low traffic, that should work out.
Why? This will allow poller auto-tuning to have an idea of if it should be increasing the number of pollers because most matches are not sync (possibly in conjunction w/ a backlog hint)
If we are making sync matches, then we can reduce the number of pollers once the wait duration starts to exceed some threshold.
It can also inform the SDK what the minimum latency is when polling, since if we didn't make a sync match, then we were pulling something out of the backlog and delivered a task roughly as quickly as is possible.
The task queue stats info is useful to help to decide how quickly pollers should be ramped up, if at all.
Breaking changes nope
Server PR Will make one assuming we're OK with this add.
Leaving as an FYI here: I started the server work but it's likely I won't get back to it for a bit b/c of other Replay-related priorities, so this might have to sit for a bit.
Superseded by https://github.com/temporalio/api/pull/553