risingwave icon indicating copy to clipboard operation
risingwave copied to clipboard

feat(storage): support graceful shutdown in local version manager

Open yezizp2012 opened this issue 3 years ago • 14 comments
trafficstars

Is your feature request related to a problem? Please describe. Currently local version manager will spawn three background jobs: start_pin_worker/start_unpin_worker/start_buffer_tracker_worker. They should expose their join handle and should be able to receive cancel signal to support graceful shutdown.

Describe the solution you'd like As described.

yezizp2012 avatar Jul 15 '22 04:07 yezizp2012

Cc @zwang28 @wenym1

yezizp2012 avatar Jul 15 '22 04:07 yezizp2012

In what case they should be shutdown from outside?

BowenXiao1999 avatar Jul 15 '22 05:07 BowenXiao1999

In what case they should be shutdown from outside?

Like when need upgrade or restart in k8s.

yezizp2012 avatar Jul 15 '22 05:07 yezizp2012

For restart in k8s, I think the whole CN will shutdown so will the pin/unpin tasks?

BowenXiao1999 avatar Jul 15 '22 05:07 BowenXiao1999

For restart in k8s, I think the whole CN will shutdown so will the pin/unpin tasks?

Not actually, the tasks are not abortable in current implementation. They should be able to receive cancel signal.

yezizp2012 avatar Jul 15 '22 05:07 yezizp2012

Currently start_buffer_tracker_worker, start_pin_worker and start_unpin_worker will exit their loop by checking ref count of LocalVersionManager. I think this is kind of graceful shutdown. They may take arbitrary time to finish current loop before exit, no matter whichever way is used to achieve graceful shutdown.

One issue is start_buffer_tracker_worker checks Arc::strong_count(&local_version_manager) > 1, while start_pin_worker/start_unpin_worker convert weak ref to strong one on demand. They may prevent each other from exiting. @wenym1

zwang28 avatar Jul 15 '22 05:07 zwang28

What can we gain from having graceful shutdown on CN?

IMO, we anyway have to handle the case when a CN is hard killed (i.e. CN failure). If we can handle CN failure promptly, can we just hard kill the CN process and treat it like CN failure so that we don't need to differentiate graceful and non-graceful shutdown? I think it is simpler to have one way to shutdown a CN and recover from CN going offline.

hzxa21 avatar Jul 15 '22 05:07 hzxa21

What can we gain from having graceful shutdown on CN?

IMO, we anyway have to handle the case when a CN is hard killed (i.e. CN failure). If we can handle CN failure promptly, can we just hard kill the CN process and treat it like CN failure so that we don't need to differentiate graceful and non-graceful shutdown? I think it is simpler to have one way to shutdown a CN and recover from CN going offline.

Well, I think for any system, graceful shutdown is always needed. Graceful and non-graceful shutdown(i.e. in failure scenario) should be treat differently, because we don't know what kind of effects of forcing stop some background tasks, which might causing data inconsistency or something else. For operator in k8s, to restart some service graceful shutdown should always be the first choice. It will send a SIGTERM firstly and wait for service exit in a timeout interval, then a SIGKILL if not success.

yezizp2012 avatar Jul 15 '22 06:07 yezizp2012

One issue is start_buffer_tracker_worker checks Arc::strong_count(&local_version_manager) > 1, while start_pin_worker/start_unpin_worker convert weak ref to strong one on demand. They may prevent each other from exiting.

Since the start_buffer_tracker_worker is the only worker that takes the strong reference to the local version manager, after the outer local version manager is not used anymore, though the other two workers may occasionally upgrade to take the strong reference, eventually the start_buffer_tracker_worker will become the only strong reference when it checks it and exit. After it exits, there will be no strong reference, and then the other two workers will fail to upgrade and exit.

But such usage is dangerous, since if we have two or more workers using this mechanism to gracefully shutdown, they will depend on each other to exit first, and will never exit.

The original purpose of using Arc instead of Weak is that each Weak upgrade will increment the reference counting by 1, and such atomic incrementation include a memory fence and may have cost, though it seems that such cost can be ignored compared to other operation of the worker.

wenym1 avatar Jul 15 '22 06:07 wenym1

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

github-actions[bot] avatar Sep 21 '22 02:09 github-actions[bot]

@yezizp2012 Do we still want this graceful shutdown?

@wenym1 Currently we have two workers in local version manager:

  • start_buffer_tracker_worker exits when local version manager's ref_count is 1, which is almost impossible to be met because each HummockStorage holds a ref to local_version_manager, and HummockStorage is cloned many times (as Keyspace is cloned).
  • start_pinned_version_worker never exit.

zwang28 avatar Sep 21 '22 04:09 zwang28

@yezizp2012 Do we still want this graceful shutdown?

@wenym1 Currently we have two workers in local version manager:

  • start_buffer_tracker_worker exits when local version manager's ref_count is 1, which is almost impossible to be met because each HummockStorage holds a ref to local_version_manager, and HummockStorage is cloned many times (as Keyspace is cloned).
  • start_pinned_version_worker never exit.

I think we'd better to support it, otherwise we can only rely on SIGKILL to stop meta and the downtime will also be longer.

yezizp2012 avatar Sep 21 '22 06:09 yezizp2012

start_buffer_tracker_worker exits when local version manager's ref_count is 1, which is almost impossible to be met because each HummockStorage holds a ref to local_version_manager, and HummockStorage is cloned many times (as Keyspace is cloned). start_pinned_version_worker never exit.

Though the HummockStorage is cloned for many times, the cloned HummockStorage are held by actor. Upon shutdown, actors will be dropped and we should expect that there is no one externally holding a HummockStorage.

BTW, if there is someone externally holding a HummockStorage, regardless of how start_buffer_tracker_worker is holding the local version manager, the start_pinned_version_worker will also never exit no matter, because there is always some strong references to local version manager other than start_buffer_tracker_worker.

Anyway, I will support some better way for graceful shutdown.

wenym1 avatar Sep 21 '22 06:09 wenym1

https://github.com/risingwavelabs/risingwave/pull/5469 will now shutdown the buffer_tracker_worker when there is no externally usage on LocalVersionManager.

In our current code, a potential unstoppable external usage on LocalVersionManager will be the notification observer. If the observer can be gracefully shutdown when we stop a CN, there will not be any external usage on LocalVersionManager.

start_pinned_version_worker has already stopped depending on LocalVersionManager and can be ignored.

wenym1 avatar Sep 21 '22 07:09 wenym1