orleans icon indicating copy to clipboard operation
orleans copied to clipboard

Orleans stream pulling agent stops pulling messages from stream suddenly

Open MittalNitish opened this issue 4 years ago • 7 comments

Hi Team / @jason-bragg

We are using Orleans 2.2.4.

We are using Azure eventhub stream with orleans stream. Previously we were using the ConsistentRingQueueBalancer, with this queue balancer streams were not distributed equally among silos, causing high memory usage at some silos. We made changes to use ClusterConfigDeploymentLeaseBasedBalancer recently. Since using lease based balancer we have noticed that sometimes stream subscriber grain stops receiving messages from stream. However there is no message publish error at sender grain side. When we restart the silos, old pending messages starts getting processed. This issue was not present with ConsistentRingQueueBalancer. We need your help in finding the cause and debugging this issue. Please let me know if you guys need any specific log for this.

Thanks

MittalNitish avatar Aug 13 '21 10:08 MittalNitish

Orleans 2.2.4 is quite old now. Could you upgrade to the latest 3.x release? You may have some changes in the config, my everything else should be backward compatible

benjaminpetit avatar Aug 16 '21 08:08 benjaminpetit

Thanks @benjaminpetit ,

Is it a known issue/bug with the version 2.2.4? If yes, is it working well with the latest version? Our code base is huge and using .Net Framework 4.6.2 which makes it very difficult to upgrade the Orleans. So we need to make sure upgrading would fix this.

MittalNitish avatar Aug 16 '21 11:08 MittalNitish

I don't recall a related issue that we might have fixed. But it's old and more difficult for us to investigate.

I see that you are using the LeaseBasedQueueBalancer based on the StaticClusterDeploymentOptions, which might be the issue: on reboot, do your silos have the same name? Do you have any logs for the queue balancer?

benjaminpetit avatar Aug 16 '21 12:08 benjaminpetit

Yes @benjaminpetit ,

Silos have same clusterId set to a constant value in builder as:

builder.Configure<ClusterOptions>(clusterOptions =>
            {
                clusterOptions.ClusterId = OrleansConfigurationConstants.ClusterId;
                clusterOptions.ServiceId = OrleansConfigurationConstants.ServiceId;
            });

I will fetch some logs and update here in a while.

MittalNitish avatar Aug 16 '21 12:08 MittalNitish

The LeaseBasedQueueBalancer was introduced in 2.x and had some issues that were addressed in 3.x, so Yes, there are known issues with that system.

jason-bragg avatar Aug 16 '21 17:08 jason-bragg

Hi @benjaminpetit , Added logs during Silo restarts and code snippet for stream configuration: logs_code_snippet.zip

MittalNitish avatar Aug 17 '21 14:08 MittalNitish

We've moved this issue to the Backlog. This means that it is not going to be worked on for the coming release. We review items in the backlog at the end of each milestone/release and depending on the team's priority we may reconsider this issue for the following milestone.

ghost avatar Jul 28 '22 23:07 ghost