orleans icon indicating copy to clipboard operation
orleans copied to clipboard

Durable Jobs follow-up

Open ReubenBond opened this issue 2 months ago • 2 comments

This issue documents follow-up items for Durable Jobs (#9717)

  • [x] Rename to Durable Jobs (update projects, types, etc)
  • [ ] Rename ScheduledJobContext to ScheduledJobRun (https://github.com/dotnet/orleans/pull/9717#pullrequestreview-3394912141)
  • [ ] Address 50K append/blob limit for Azure Storage implementation (https://github.com/dotnet/orleans/pull/9717#pullrequestreview-3394719873)
  • [x] Use IOverloadDetector in LocalScheduledJobManager to throttle execution when the host is overloaded (https://github.com/dotnet/orleans/pull/9717#pullrequestreview-3394850102)
  • [ ] IScheduledJobReceiverExtension.DeliverScheduledJobAsync should return some kind of ScheduledJobRunResult which includes a TimeSpan PollAfter property so that long-running requests can be better supported.
  • [ ] Flow CancellationToken in tests, so we can have a short (eg, 2-min) timeout for each test.
  • [ ] Make the non-CancellationToken arguments passed to ILocalScheduledJobManager.ScheduleJobAsync a class or struct to make it easier to add properties later without breaking existing callers/impls.
  • [ ] Idea: Support multiple concurrent accounts in AzureStorageJobShardManager for improved scaling, migration, etc (https://github.com/dotnet/orleans/pull/9717#pullrequestreview-3404983204)
  • [ ] Observability - Do a pass on tracing, metrics, logs
  • [ ] Rebalancing - We need to de-assign/rebalance shards if we have too many to avoid skew after a new deployment or upgrade. (https://github.com/dotnet/orleans/pull/9717#pullrequestreview-3405930327)
  • [ ] Concurrent shard limit - We should limit the number of concurrently assigned shards per silo to prevent memory exhaustion. (https://github.com/dotnet/orleans/pull/9717#pullrequestreview-3405930327)
  • [ ] Shard assignment slow start - We should consider performing a slow start for shard assignment, only reading a number of shards based on how long the silo has been up. It's important for disaster recovery scenarios, as we have seen with Azure ML. (https://github.com/dotnet/orleans/pull/9717#pullrequestreview-3405930327)
  • [ ] Concurrent job slow start - We should gradually increase job concurrency (semaphore.Release) during startup until we hit our target. This helps to avoid starvation issues which can happen before things have warmed up (caches, connection pools, thread pool sizing, etc). (https://github.com/dotnet/orleans/pull/9717#pullrequestreview-3405930327)
  • [ ] Make sure we handle multi-cluster deployments more gracefully in AzureStorageJobShardManager (and other impls, ideally) (https://github.com/dotnet/orleans/pull/9717#pullrequestreview-3404974478)

ReubenBond avatar Nov 04 '25 21:11 ReubenBond

@ReubenBond Regarding the item "Rename to Durable Jobs (update projects, types, etc)", I am confused, what about the backward compatibility with Reminders V1?

Until now, I assumed that both Durable Jobs and Reminders would live side by side, so that users can migrate their existing reminders progressively. I assumed that the new name was meant to make this co-existence easier.

nkosi23 avatar Nov 08 '25 09:11 nkosi23

@nkosi23 Durable Jobs & Reminders will remain separate, so they can run side-by-side and you can gradually migrate from one to the other.

ReubenBond avatar Nov 08 '25 16:11 ReubenBond