geode
geode copied to clipboard
GEODE-8856: Persist gateway-sender startup-action
New startup-action parameter with values "stop", "pause" and "start" is now persisted during the runtime when following commands are issued:
- pause gateway-sender --> startup-action="pause"
- stop gateway-sender --> startup-action="stop"
- start gateway-sender --> startup-action="start"
- resume gateway-sender --> startup-action="start"
The startup-action parameter is persisted within cluster configuration when above gfsh commands are executed successfully on at least one of the servers. Parameter is not updated and persisted when commands are executed per member as cluster configuration is not persisted in that case.
New startup-action parameter will now inter-work with manual-start in a following way:
- If manual-start="true" and startup-action parameter is missing, then gateway sender will require manual start (same as before).
- if manual-start is not set (or "false") and startup-action parameter is missing, then gateway sender will be started automatically (same as before).
- If parameter startup-action is available in cluster configuration at startup, then gateway-sender will try to reach that state regardless of manual-start parameter value.
The manual-start is also improved in order to fully comply to above requirement to start gateway sender in stopped state.
Current problem with manual-start parameter:
Currently, when manual-start is configured to be "true", then colocated persistent parallel gateway sender queue region and buckets are not recovered after server is restarted. Because of that the main persistent region that is colocated with gateway sender queue region cannot reach online status.
Solution to manual-start problem implemented in this commit:
When manual-start parameter is "true" or gateway sender startup-action is "stop", then persistent parallel gateway-sender queues will now be recovered (if needed) from persistent storage during startup of the server. Queues will be recovered by using the existing mechanism that is used when gateway sender is automatically recovered (manual-start==false) after server is restarted. In this case parallel gateway sender queue persistent region and buckets are recovered (if needed) right after the main persistent region and buckets are recovered.
Additionally in above case, parallel gateway sender will now reach the same state that it has when first started and then stopped by using gfsh command. In that state parallel gateway sender queue buckets remain on the servers, but dispatcher threads are stopped and non of the events are stored in queues.
@kirklund thanks for the comments! I will add new unit tests to this PR.
Restarted concourse-ci/DistributedTestOpenJDK11 run.
This pull request introduces 1 alert when merging 33d9076243d50e7a6b401cee9548b37464ce62f4 into 5567fe265f922f3cdd3303f5383a232bf1649b84 - view on LGTM.com
new alerts:
- 1 for Dereferenced variable may be null
The flaky test AutoConnectionSourceDUnitTest.testClientDynamicallyDropsStoppedLocator
that is failing in concourse-ci/distributed-test-openjdk8
is not connected to this PR.
Next to each reviewer in the upper right is a circle of two arrows to "Re-request review". If you click that it pops back into our queue of PRs to review. I definitely recommend using that!
this PR appears to be abandoned, can it be closed?
Hi @onichols-pivotal ,
Sorry, I missed to apply some of the comments. I will fully focus on that now and apply them as soon as possible. I would like to merge this PR as it's solution is complete (comments are just related to tests), and I also think that this is good feature to have in Geode, especially if you run Geode in automated environments like Kuberentes. I know it is hard to review this PR as lots of things have changed through time, but I open to any suggestions from you side to make it more easier for you to review it. Sorry for any inconvenience.