orleans icon indicating copy to clipboard operation
orleans copied to clipboard

[EPIC] Papercuts to fix for 4.0.0

Open ReubenBond opened this issue 4 years ago • 11 comments

This is an issue to log and track papercuts: small annoyances/inconveniences which developers often run into. Please feel free to add to the list and to pick off items on the list to fix.

  • [ ] Silo shutdown takes too long, even in dev/test scenarios with a single silo and no persistence
  • [x] Logging noise: MemoryStorageGrain grain logs OnActivateAsync message
  • [x] Logging noise: On shutdown, silo spews a lot of information which is not meaningful to developers
  • [x] Logging noise: Gateway refresh notifications on client are unnecessary
  • [ ] Logging noise: Periodic statistics dumps are unnecessary / unexpected
  • [x] Logging noise: deployment load publisher
  • [ ] Logging noise: Logging that CleanupDefunctSiloEntries is not supported is unnecessary, especially as a warning. We should also not throw an exception and should implement it on most/all providers
  • [ ] Logging noise: "No implementation of IHostEnvironmentStatistics was found. Load shedding will not work yet"
    • Add config validator to only warn/error if LoadSheddingOptions.LoadSheddingEnabled is true
  • [ ] Logging noise: VirtualBucketsRingProvider
  • [x] Logging noise: Counters
  • [x] ClusterId & ServiceId should have a configured default, eg "default"
  • [x] ISiloBuilder vs ISiloHostBuilder is confusing - delete SiloHostBuilder entirely and steer everyone towards generic host
  • [ ] The distinction between Grain<T> and Grain with IPersistentState<T> injected could be confusing - we should consider removing Grain<T>
  • [x] Application Parts should never need configuring (Fixed in #7070 by removing app parts entirely)
  • [ ] Orleans calls issued before the silo/client has fully initialized should either throw or wait until startup has completed - waiting is probably nicer UX, as long as there's still a timeout to say that it's waiting for initialization.
  • [ ] Introduce ObserverManager as a utility to help with observers and update docs again
  • [x] Rename the Consul package to Microsoft.Orleans.Clustering.Consul and clean up package metadata
  • [x] Rename the ZooKeeper package to Microsoft.Orleans.Clustering.ZooKeeper & clean up metadata
  • [x] #7456
  • [ ] Microsoft.Orleans.Client and Microsoft.Orleans.Server should pull in Microsoft.Orleans.Sdk to reduce scope for errors.

ReubenBond avatar Jun 04 '21 13:06 ReubenBond

May adding support for stateless worker grains in heterogeneous silos be considered as papercut? :)

krasin-ga avatar Jun 22 '21 15:06 krasin-ga

@krasin-ga yes, it would. Which version are you running? I recall that we fixed an issue with StatelessWorker grains in heterogeneous clusters

ReubenBond avatar Jun 22 '21 15:06 ReubenBond

I am using version 3.4.2.

A little preface. I use the concept of "roles" to segregate the responsibilities of silos. So a silo with a certain role places only those grains that belong to it. Something like a set of micro clusters within a cluster. This allows for flexible resource management in silo deployment, faster localization of problems, and an additional level of isolation.

But this approach has a few drawbacks.

  1. This is the problem I was referring to. The documentation says that statless worker grains must be deployed on all silos. I got around this by using grains that handle one request and then get deactivated using the DeactivateOnIdle method. I didn't realize this could already be fixed :)

  2. The grain catalog is common to all silos and restarting or crashing a silo with one role affects silos with other roles. Some of the grains are deactivated. Part of the requests may be lost #6746. Both client to silo and silo to silo. It may cause some downtime when a large ResponseTimeout has been set.

  3. There were problems when starting clients and silos in a certain order. Clients kept not seeing certain grain implementations even though all the silos were already running by then. Maybe this was only within the time frame set by TypeMapRefreshInterval, I'll check that later.

krasin-ga avatar Jun 22 '21 18:06 krasin-ga

It would be nice to be able to register a cluster client as a service rather than building it as a host. This would be more in line with its responsibilities. Right now, to register a client inside another host (like asp .net core), I use UseServiceProviderFactory and remove IHostApplicationLifetime along with logging services so they don't overwrite my registrations. Then IHostedService is used to initiate the connection of the client to the cluster.

I also propose to make IClusterClient a generic type: IClusterClient<TCluster>. This would make it fairly easy to register and use multiple cluster clients within the application, including within the silo.

krasin-ga avatar Jun 23 '21 06:06 krasin-ga

@krasin-ga yes, it would. Which version are you running? I recall that we fixed an issue with StatelessWorker grains in heterogeneous clusters

I've checked, and everything actually works great. Thanks!

krasin-ga avatar Jun 23 '21 13:06 krasin-ga

It would be nice to be able to register a cluster client as a service rather than building it as a host.

Maybe. The client is self contained. What I usually do, is to build an IClusterClient and register it on the generic Host DI container. After that, I register an IHostedService which injects this client and start/stop it. That way, when the application (i.e. ASP.Net) is starting, it will start the client and shutdown it gracefully with the shutdown.

galvesribeiro avatar Jul 26 '21 12:07 galvesribeiro

May I suggest the good old timer reentrancy issue to be fixed? https://github.com/dotnet/orleans/issues/2574

Xanatus avatar Jul 29 '21 14:07 Xanatus

@Xanatus Thank you, good suggestion

ReubenBond avatar Aug 05 '21 13:08 ReubenBond

I register an IHostedService which injects this client and start/stop

@galvesribeiro

One of the things I've raised several times on the Discord is how many folks are looking for prescriptive guidance and best practices.

One of the great things about Orleans is how flexible it is and how it can be shaped to meet your exact use case and that probably has to do with not having lots of opinionated aspects to Orleans.

As Orleans gains additional momentum and adoption from the visibility of now being under the .NET Development Group, I think there may be a benefit to offering some level of guidance that meets the 80%.

A great example of this is the client connection story. Having to roll your own all the time can lead to subtle mistakes and gaps in the implementation.

It would be nice to have something out of the box that can just be configured and injected.

I haven't dug into 4.0 deeply yet so I'm not sure how much of this story has changed; please forgive me if this already exists.

ElanHasson avatar Aug 10 '22 01:08 ElanHasson

I agree with you @ElanHasson.

There are some rework on the client itself of which it wont need its own DI container anymore and can be added directly to any IServiceCollection however, the initialization of the client still requires a an async call.

We have two options here:

  1. Pay the cost for the initialization on the first grain invocation transparently: This is "easier" but brings some complication to the table like queuing all the calls until the first one gets the client initialized and then starts dequeuing and invoking all others or any other synchronization mechanism;
  2. For applications that run using the Microsoft.Extensions.Hosting library we can just install an internal IHostedService which connects the client before all other services even start: This is what I manually do Today but this also wouldn't work if you are not using that library.

Do you have any suggestion what would be the best approach?

galvesribeiro avatar Aug 10 '22 01:08 galvesribeiro

btw here is an example of what I have to do: https://github.com/web-scheduler/web-scheduler-api/blob/main/Source/WebScheduler.Client.Core/HostedServices/ClusterClientHostedService.cs#L78-L115

I'm not even sure how correct that is, but it works. (i know I have primitive retry handling there.)

@galvesribeiro. The Orleans strength is in it's utility.

I like option #2.

A IHostedService that can take the client builder as configuration and handles everything for you needed for managing the IClusterClient. Something like this as a NuGet would be perfect.

ElanHasson avatar Aug 10 '22 16:08 ElanHasson

The remaining items on this list have been migrated to https://github.com/dotnet/orleans/issues/8257

Closing this - please bring up any relevant papercuts in the new thread.

ReubenBond avatar Jan 12 '23 00:01 ReubenBond