geode icon indicating copy to clipboard operation
geode copied to clipboard

GEODE-8856: Persist gateway-sender startup-action

Open jvarenina opened this issue 2 years ago • 0 comments

New startup-action parameter with values "stop", "pause" and "start" is now persisted during the runtime when following commands are issued:

  • pause gateway-sender --> startup-action="pause"
  • stop gateway-sender --> startup-action="stop"
  • start gateway-sender --> startup-action="start"
  • resume gateway-sender --> startup-action="start"

The startup-action parameter is persisted within cluster configuration when above gfsh commands are executed successfully on at least one of the servers. Parameter is not updated and persisted when commands are executed per member as cluster configuration is not persisted in that case.

New startup-action parameter will now inter-work with manual-start in a following way:

  • If manual-start="true" and startup-action parameter is missing, then gateway sender will require manual start (same as before).
  • If manual-start is not set (or "false") and startup-action parameter is missing, then gateway sender will be started automatically (same as before).
  • If parameter startup-action is available in cluster configuration at startup, then gateway-sender will try to reach that state regardless of manual-start parameter value.

The manual-start is also improved in order to fully comply to above requirement to start gateway sender in stopped state.

Current problem with manual-start parameter:

Currently, when manual-start is configured to be "true", then colocated persistent parallel gateway sender queue region and buckets are not recovered after server is restarted. Because of that the main persistent region that is colocated with gateway sender queue region cannot reach online status.

Solution to manual-start problem implemented in this commit:

When manual-start parameter is "true" or gateway sender startup-action is "stop", then persistent parallel gateway-sender queues will now be recovered (if needed) from persistent storage during startup of the server. Queues will be recovered by using the existing mechanism that is used when gateway sender is automatically recovered (manual-start==false) after server is restarted. In this case parallel gateway sender queue persistent region and buckets are recovered (if needed) right after the main persistent region and buckets are recovered.

Additionally in above case, parallel gateway sender will now reach the same state that it has when first started and then stopped by using gfsh command. In that state parallel gateway sender queue buckets remain on the servers, but dispatcher threads are stopped and non of the events are stored in queues.

For all changes:

  • [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

  • [x] Has your PR been rebased against the latest commit within the target branch (typically develop)?

  • [x] Is your initial contribution a single, squashed commit?

  • [x] Does gradlew build run cleanly?

  • [x] Have you written or updated unit tests to verify your changes?

  • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

jvarenina avatar Sep 15 '22 08:09 jvarenina